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(54) DATA PROCESSING SYSTEM 



(71) We, NCR CORPORATION, for- 
merly The National Cash Register Company 
of Dayton in the State of Ohio, and Balti- 
more in the State of Maryland, United States 

5 of America, a corporation organized under 
the laws of the State of Maryland, United 
States of America, do hereby declare the in- 
vention, for which we pray that a patent may 
be granted to us, and the method by which 

10 it is to be performed, to be particularly de- 
scribed in and by the following statement : — 
This invention relates to data processing 
systems. 

According to the present invention, there 

15 is provided a data processing system when 
conditioned by programing means which in- 
cludes translation means arranged to cause the 
data processing system to translate a source 
program in a high level source language into 

20 an intermediate language which is a high level 
language independent of the source language, 
said translation means being effective, in oper- 
ation, to separate executable statements (as 
herein denned) from non-cx ccutable informa- 

25 tion (as herein defined). 

It will be appreciated that a data process- 
ing system according to the immediately pre- 
ceding paragraph has the advantage that since 
the intermediate language is independent of 

30 the source language, the same program in 
different high level languages can be arranged 
to be translated into the same intermediate 
language program. This is advantageous be- 
cause a single compiler can then be arranged 

35 to compile from the intermediate language into 
a low level language. 

It should be understood herein that a "high 
level language" is a programming language 
wherein each instruction or statement gener- 

40 ally corresponds to a plurality of machine 
code instructions. Examples of high level lan- 
guages are ALGOL, FORTRAN, COBOL 
and PL/L A "low level language" is a pro- 
gramming language wherein each instruction 

45 generally corresponds to a single machine code 
instruction. The process of converting a pro- 
gram from a high level language into a low 
level language is generally referred to as "com- 



piling", and a program which effects such 

a conversion is generally referred to as a 50 

"compiler". 

It should also be noted that by an "execut- 
able statement" herein is meant a program 
statement which specifies an operation to be 
performed on data. By "non-executable in- 55 
formation" is meant information which does 
not specify an operation to be performed on 
data; for example information specifying the 
characteristics of data. 

One embodiment of the invention will now 60 
be described by way of example with refer- 
ence to the accompanying drawings, in 
which : — 

Fig. 1 is a block diagram showing the 
various stages involved in a computer system 65 
using a generalized compiler; 

Fig. 2 shows a block diagram of a specific 
embodiment wherein any of a number of 
commonly used source languages and any one 
of a class of computers are provided with 70 
language translators making mem adaptable 
for use with the generalized compiler; 

Fig. 3 is a schematic drawing of the 
sequential steps used in making a source pro- 
gram available for a specific host computer; 75 

Fig. 4 is a schematic showing how the 
various manifestations of the source program 
are acted upon by elements of the computer 
system in order to transform them into ad- 
vanced stages or conditions for final use of 80 
the host computer; 

Fig. 5 is a schematic drawing laying out 
the structure of an "executable" statement 
in the format of a First Intermediate Lan- 
guage; 85 

Figs. 6 and 7 are exemplary tree schematics 
illustrating information heirarchies in a new 
Metalanguage; 

Fig. 8 is the organizational chart showing 
how the set of the major information struc- 90 
tures called "groups" is subdivided into major 
subsets each of which are successively further 
subdivided into two smaller subsets; 

Fig. 9 is an organization- chart showing how 
the set of all elements of information, e, is 95 
subdivided into two major subsets and how 
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these smaller subsets forming a heirarchy of 
groups of elements of information. 

Fig. 10 is an illustration of the use of the 
new Metalanguage to specify one type of IF 
5 statement. The information unit, Eln, is shown 
expanded into successively lower levels of 
complexity of information. 

Figs. 11 through 14 are drawings of vari- 
ous configurations of computer systems in- 
10 corporating the generalized compiler concept. 

Description 
The large scale integration (LSI) pheno- 
menon in the semiconductor technology is 
having far reaching impact on the entire in- 

15 formation processing industry. The avail- 
ability of inexpensive and reliable hardware 
building blocks containing complex logic cir- 
cuits opens realistic avenues for the develop- 
ment of more efficient computing systems. 

20 The ever increasing cost and complexity of 
the computer software on the other hand, 
necessitates a new look to the computer sys- 
tem architecture and hardware-software inter- 
relationship. 

25 Program compilation often requires more 
computer time than execution of the program. 
A significant amount of computer power is 
spent for the translation of high level pro- 
gramming languages into machine language. 

30 Compilers have increased in size from 10 — 40 
K words for FORTRAN or COBOL up to 
and over 195 K words for one version of 
FORTRAN. Hours of expansive computer 
time are consumed by the compiler. Other 

35 inefficiences are caused by such tedious opera- 
tions as compiler and program debugging and 
maintenance. In multiprogramming and time- 
sharing systems, compiling causes significant 
system overhead due to increased program 

40 swapping. Computers require larger main 
memories because of the core requirements of 
compilers. 

Generally, compiler costs are an exponen- 
tial function of the size of the compiler task 

45 which is incorporated into one integrated unit. 
Further growth in the complexity of the 
source language is severely limited by the 
increasing size and cost of the compiler using 
current compiler techniques. By dividing the 

50 compiler tasks into independent isolated 
modules which separate the user problem 
transformation task from the language syntax 
and semantics task, these limitations can be 
reduced and the compiling task simplified. 

55 Compilers do not attempt to separate lan- 
guage translation tasks from problem trans- 
formation and consequently any change in 
either the source language or machine code 
can cause changes throughout the compiler. 

60 Source languages and computer hardware are 
in a constant state of flux and as a result a 
new compiler must be written for each source 
language each time a new computer is de- 
signed. Much of this work can be avoided by 



recognizing and taking advantage of the fact 65 
that compiler tasks can be separated into two 
independent classifications: language transla- 
tion and problem transformation. By keeping 
these groups of tasks separated in the com- 
piler, the impact of a change in the source 70 
language or a change in the computer hard- 
ware can be limited to the language translation 
portion of the compiler. The part of the com- 
piler concerned with the transformation of 
the user's problem need not be changed be- 75 
cause changing the language used to state the 
problem does not change the problem. 

The generalized compiler herein is based 
on clearly separating "language translation" 
from "problem transformation". If a source 80 
program is defined as the combination of a 
problem algorithm and the exposition of that 
algorithm in some source language, then, the 
problem transformation consists of the map- 
ping of the problem algorithm from a com- 85 
plex function-oriented format into a simpler 
operation sequence format. Language trans- 
lation consists of changing the format of the 
exposition of the algorithm. To accomplish 
this, the source language is first translated 90 
into an intermediate language (IL), which has 
been designed to be independent of source 
language. 

Then the problem, in intermediate lan- 
guage, is transformed from a set of procedures 95 
and algebraic expressions into a sequence of 
computer operations. Finally a second lan- 
guage transformation is performed to convert 
the above machine-independent output into 
operation codes for a specific computer. By 100 
designing the compiler as three independent 
modules, the impact of a change is limited. 
Thus, a change in source language only affects 
the first module, the source language trans- 
lator. Likewise, the introduction of a new 105 
computer with a new machine code instruc- 
tion set simply requires the change or replace- 
ment of the second language translator. In 
either case, the problem transformation por- 
tion of the compiler and one language trans- HO 
lator remains unchanged. If a new computer 
using a new implementation language is intro- 
duced, the compiler may have to be trans- 
lated into the new implementation language. 
Even this task is considerably smaller than l* 5 
translating a conventional compiler. 

A large part of the compiler is in inter- 
mediate language which is independent of 
both the source language and the specific 
computer used for program executioiL Con- * 20 
sequently, it may be feasible to implement 
this part of the compiler in hardware since it 
is reasonably insensitive to changes. 

Intermediate Language 

In order to separate the compiler tasks into ^ 
independent modules and thereby reduce the 
complexity, problem transformation is separ- 
ated from language translation. To do this, it 
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is necessary to be able to state the problem 
in a language which is independent of the 
source language. This is the purpose of hav- 
ing an intermediate language (IL). This also 
5 defines the nature of the IL. The IL language 
can be divided into two independent parts: 
nonexecutable and executable. The nonexe- 
cutable part "essentially specifies the charac- 
i sties of die data and the executable part 
10 specifies the functions to be performed on 
the data. 

All nonexecutable source program informa- 
tion is passed to the generalized compiler by 
means of tables which are collectively called 

15 a symbol table. Each identifier in the source 
program occupies one symbol table word. 
Various attributes of this identifier are coded 
into the proper bit positions in this word 
during the language translation of the source 

20 language. The "identifiers" are nounal items 
such as A, B, C, or X, Y, Z which identify 
the subject matter in a given algorithm. For 
example: pay = hours-worked x hourly-rate 
or C=A x B; Thus A, B, C, pay, hours- 

25 worked, and hourly-rate are identifiers. 

The executable part specifies the functions 
or operations required and the sequence in 
which they are to be performed. The same 
function may be specified in a different way 

30 in each source language. However, the IL 
need provide only one way of specifying a 
specific function. The total number of func- 
tions described in IL will be greater than 
the number incorporated into any one source 

35 language since the various source languages do 
not all encompass the same set of functions. 
Likewise, the IL requires a larger set of 
operators than any single source language. 
In the practical implementation herein pro- 

40 posed, the IL input to the generalized com- 
piler is designed to be language independent 
for COBOL, FORTRAN and PL/1 thereby 
encompassing nearly all higher level programs 
being written at present. Once this concept 

45 of an IL is established, it can be expanded 
by the later addition of more functions and 
more operators without the complications such 
changes would cause in current compilers. In 
fact, thinking of source language in terms of 

50 new problem functions may be a great aid 
in the more orderly development of a more 
powerful, simple to use, generalized source 
language. In addition, the separation of data 
attributes is compatible with the concepts of a 

55 common data base. 

Source Language Translator 

In order to separate the problem transfor- 
mation task from the language translation, the 
problem must be stated in a manner that is 
60 source language independent. This is the task 
of the source language translator. The source 
language translator (SLT) strips away the 
excess verbiage, redundancy, and ideosyncra- 
cies from the source language. (One SLT is 



required for each source language.) It recog- 65 
nizes source language operator codes and key- 
words and translates these as well as execut- 
able procedures and algebraic expressions into 
intermediate language (IL — 1 format). It 
places all names and declared attributes of 70 
the data and the environment into the symbol 
table replacing the names (operands) in the 
executable program with symbol table 
addresses. The IL — 1 input format for expres- 
sion may consist of alternating operands and 75 
operators. If an operand is not present be- 
tween two operators in a source language 
expression, the SLT supplies a special "no 
operand" code. This is one method of elimin- 
inating a duplication of language translation 80 
within the generalized compiler. 

Source languages with a precedence algo- 
rithm different from the IL precedence algo- 
rithm are translated by the addition of paren- 
theses by the SLT. 85 

Generalized Compiler 

A Problem Transformation Module (Gener- 
alized Compiler, GC) is used to transform the 
executable procedures and expressions (which 
are in a standard IL — 1 format independent 90 
of the source language) from a problem or 
function oriented form into a computer opera- 
tion form consisting of a sequence of machine 
level instructions performed on elementary 
variables. The GC reorders the sequence of the 95 
operations of an expression according to a 
commonly used precedence algorithm. In 
addition to the expression processor routine, 
the GC contains routines for the transforma- 
tion of all other executable statements. Appro- 100 
priate code sequences are generated to trans- 
fer control to functions and sub-routines and 
to return as well as to test DO loop conditions 
and to increment to DO loop variables. 

Mode Code Translator 105 

The Machine Code Translator (MCT) is a 
simple language translator to translate the 
output operation codes of the GC into the 
machine codes of a specific computer. It 
combines symbol table information and mach- 110 
ine code skeletons with the output (operand- 
operator-operand) triads of the GC. 

System Organization 

Fig. 1 shows the system organization where- 
by the Source Program (in a high-level lan- 115 
guage such as FORTRAN, COBOL, etc.) is 
handled by a Source Language Translator 
10 (SLT) separating executable statements 
from non-executable information and is con- 
verted to a First Intermediate Language 120 
(IL — 1). The data from the SLT 10 is then 
handled by the Generalized Compiler 11 
(GC) which converts the statements into a 
Second Intermediate Language (IL— 2) which 
is then handled by the MCT 14 in order to 125 
convert the program information into a speci- 
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fk Object Machine Code for the host com- 
puter. 

The GC 11 provides the function of "prob- 
lem transformation" with a Problem Trans- 

5 formation Program 12 and the function of 
symbol translation with Symbol Table 13. 

Fig. 2 shows a more specific system where 
each individual high level Source Language, 
as C, F, or A, is handled by specific Source 

10 Language Translators 10c, lOf, 10a, etc., to 
convert information into IL — 1. Then GC 11 
provides the program information in IL — 2 
to specific Machine Code Translators as 14ml, 
or 14 jn2 , or 14 m3 , depending on the specific 

15 make of the host computer. 

Source Language Translator Tasks 

The four principal tasks of the Source Lan- 
guage Translator are: 

1) Identify all source language symbols. 
20 2) Create a symbol table of all data attri- 
butes. 

3) Translate all executable source language 

statements into intermediate language. 

4) Detect source language syntax errors. 

25 AH source language symbols must be recog- 
nized. The source language character set is 
translated as necessary to conform to the IL 
code format. Each delimiter (symbol which 
communicates "separation" of items of infor- 

30 motion for control purposes) is replaced by its 
equivalent IL code. Each identifier is added 
to the symbol table unless it is already listed. 
In either case, each identifier is replaced by 
its symbol table address in all executable 

35 statements. 

In terms of sequences, Fig. 3 illustrates 
the program conditions (a), (b), (c), (d), where 
the Source Program, in a High Level Lan- 
guage, is converted to a First Intermediate 

*0 Language, then to a Second Intermediate 
Language, and finally to being the Source 
Program in Object Machine Code where it is 
usuable for the host computer. 

Fig. 4 illustrates the points of impingement 

45 of: the SLT 10 Program on the Source Pro- 
gram (a) to convert it to condition (b); the 
GC 11 program which converts the Source 
Program from First Intermediate Language 
condition (b) to Second Intermediate Lan- 

50 guage condition (c); the MCT 14 program 
which converts the Source Program in con- 
dition (c) from Second Intermediate Language 
to condition (d) into Object Machine Code. 
Symbols in a Source Language may be 

55 categorized to include those shown in the 
following Table 1. 

(A) Delimiter 

1. Operator 

a. Primitive ( + , — , *, .GK, etc.) 
60 b. Keyword Executable Command 

(D. GO TO, MOVE, etc.) 

2. Punctuation (separation, control) 



(B) Identifier 

1. Keyword: Program structure name 
(procedure, block, division, section, etc.) 65 

2. Statement Label 

3. Variable (includes constants, literals) 

4. Function 

5. Subscript 

6. Argument 70 

7. Expression 

(C) Comment 

(D) Invalid Synnbol or Keyword 

All the data declaration and environment 
description information is placed into the Sym" 75 
bol Table 13. Comments are deleted. The 
symbol table takes the place of all nonexecut- 
able statements when the source program is 
in intermediate language. 

Each executable source language statement 80 
is translated into intermediate language. 
Essentially this consists of eliminating the re- 
dundancy and ideosyncracies and reordering 
the information into a language independent 
format. Key words are replaced by the equi- 85 
valent IL code which always occurs as the 
first word of a statement. Arguments and 
parameters appear in a specified order separ- 
ated by commas or some other delimiter but 
without the memory and readability aids 90 
found in source languages. 

An "end of expression" code is added to 
the end of each statement function expression 
and the name of the expression is stored in 
a special area in the symbol table 13 so that 95 
the symbol table address which replaces the 
name in the Source Program is identifiable 
as an expression name. Statement functions 
(FORTRAN) and similar source language 
devices specify the solution of an equation 100 
each time the name of the expression appears. 
The IL code for "NO OPERAND" is 
inserted in between every pair of operators 
in the Source Program which are hot separ- 
ated by an operand as necessary to avoid 105 
duplicating syntax recognition in the general- 
ized compiler. 

Each source language statement translated 
into intermediate language is consecutively 
numbered. Invalid characters are noted and 110 
converted into blanks. Error routines are in- 
voked for invalid keywords and other syntax 
errors. Parentheses are added as needed to 
convert the precedence implied in the source 
language into the precedence conventions of 115 
the intermediate language in case a difference 
exists. 

Source language syntax errors such as 
illegal characters, delimiters^ keywords, and 
identifiers are detected. 120 

Any of the currently used techniques for 
recognizing and translating source language 
can be adapted for this computer system. 
Syntax-directed compiling and table driven 
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compiling are the most common techniques 
used. The characteristic of the newly invented 
system are: 

1. The translation is much simpler because, 
5 it is a translation into another high 

level language rather than a translation 
into machine code. 

2. The impact of a change in the source 
language is limited to a small part of 

10 the compiler system rather than affect- 

ing the entire computer system. 

Problem Transfortnation Tasks 

The principal tasks of the problem trans- 
formation module 12 are to: 

15 1) Transform problem functions into com- 
puter operation sequences. 

2) Optimize the object code generated. 

3) Detect errors. 

The executable statements are transformed 
20 from high level function-oriented format into 
computer hardware operation-oriented code 
and by rearranging the sequence in which the 
operations appear according to operator pre- 
cedence rules of an intermediate language 
25 created as part of the system. The interme- 
diate language operates such that: 

(a) the keyword in each statement directs the 
problem transformation control to the rou- 
tine required to transform that statement. 

30 For many statements, a direct transforma- 
tion can be made using a code skeleton 
and substituting arguments from the state- 
ment for dummy variables. 

(b) an expression processor is used to trans - 
35 form expressions within the statement. This 

processor uses one or more push-down 
stacks to aid in reordering expressions to 
meet the precedence criteria established. 
A push-down stack is also used to permit 

40 the use of recursion within expressions. 
Each time a triad (two operands and an 
operator) encounter an operator of lesser 
precedence, the triad is removed from the 
expression as IL output code. The triad is 

45 replaced in the expression by the symbol 
table address which specifies where the 
triad output cede operation result is stored. 

The object code is optimized by efforts 
such as the elimination of the repeated execu- 

50 tion of the solution of the same expression 
whenever the values have not changed, in the 
interim. The assignment of index registers to 
expedite address computation is another opti- 
mization technique. If the number of registers 

55 is limited, optimization will dictate that the 
innermost loops of a DO loop receive pre- 
ference over outer loops. Conflicts may also 
occur between local and global optimization. 
Error detection includes finding illegal 



sequences of operands and operators which the 60 
expression processor will detect. In addition, 
the various keyword statement routines will 
detect missing arguments for which no alter- 
natives are specified and for which default 
values are not provided- 65 

Second Larzgtcage Translator Tasks 

The principal Second Language Translator 
tasks are: 

1) Translate the problem transformation out- 
put (triads) into machine code for a specific 70 
machine using code skeletons and symbol 
table information. This is accomplished by 

the Machine Code Translator 14 of Fig. 1. 

2) Decide data type dominance for operations 

on operands of two different data types. 75 
Generate code needed to convert data type. 

3) Assign storage locations to all object code 
identifiers in the symbol table. Replace the 
symbol table addresses of these identifiers 
with the assigned storage locations. 80 

The purpose of the second language trans- 
lator is to convert the machine-level machine- 
independent problem transformation output 
triads and othei machine-independent instruc- 
tions into instructions (primitive code) for a 85 
specific machine. The important point to keep 
in mind is that the only impact necessary on 
a compiler due to a machine code change is 
a language translation. Therefore, the com- 
piler should be designed so that parts of the 90 
compiler not involved in machine code 
generation are independent of the features of 
a specific machine. This will insure that the 
impact of a change to a new machine will 
be strictly limited to the code generation 95 
skeletons. It is the job of the code generator 
to provide the object code necessary to exe- 
cute the program instructions on a specific 
computer. To do this, the code generator 
uses the combined information of the output 100 
of the Problem Transformation Module, Ex- 
pression Processor, the information in the 
symbol tables, and the code skeletons. 

The code skeleton selected depends not onlv 
on the particular machine instruction as speci- 105 
fied by the operator (symbols used to repre- 
sent the "action" to be done on the identifiers, 
or operands) but also on the attributes of the 
data involved. The attributes of the data are 
specified or implied for each operand in the 110 
symbol table at the address which is the 
value of the operand. Each symbol table 
attribute field may be considered to be part 
of a micro-program instruction needed for a 
particular combination of attributes. 115 

Every data type has a predefined domin- 
ance relationship in operations on two oper- 
ands. Thus, for example, a multiply operation 
involving a fixed-point and a floating-point 
number requires machine code instructions to 120 
convert the data type of one of the numbers 
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before the multiply instruction is executed. 
In this case, floating-point dominates fixed- 
point and consequently, the fixed-point num- 
ber is converted into floating-point. After the 
5 multiplication, the answer is converted as 
needed to match the type specified for the 
answer. 

The final task of the Second Language 
Translator is to replace each symbol table 

10 address in the object code with the storage 
address allocated to it and listed in the sym- 
bol table. This task is done after tiae program 
is in the machine code for a specific target 
computer inasmuch as it is possible that die 

15 translation to a specific machine code may 
not be one-to-one. If an additional instruction 
were required to be inserted after storage 
allocation, each reference to any address be- 
yond the point of insertion would have to be 

20 located and changed. Such a task would be 
far too difficult to be practical. 

Description - of the Intermediate Languages 
(IL) 

Functional Organization: The function of 
25 IL is to describe the user program interfaces 
with the Generalized Compiler. It has formats 
that are independent of both the source lan- 
guage and the computer on which the user 
program is to be executed. User program 
30 statements fall into two categories: 

1) Nonexecutable information! 

2) Executable statements 
Nonexecutable statements are used for non- 
changing information such as data attributes, 

35 file attributes, storage allocation, and control 
information. In IL, this information is in 
the form of a symbol table. 

Executable statements describe the actions 
to be performed on the source data during 

40 program execution in order to obtain the 
solution of the solution of the user's problem. 
Executable statements are transformed in the 
generalized compiler from higher level, func- 
tion-oriented language format, IL — 1, into 

45 machine level operation-oriented format, 
11^-2. 

Nonexecutable Information: Before the 
start of problem transformation, the informa- 
tion from non-executable statements has been 

50 translated into symbol table entries by the 
SLT 10 of Fig. 1. In addition, every identi- 
fier appearing in any executable statement 
has been listed in the symbol table. Thus, the 
symbol table 13 contains the names of all 

55 identifiers in die source program and the 
attributes which have been specified. 

Attributes are grouped into fields in the 
symbol table. Any attribute within a group can 
be specified by placing its assigned code in 

60 the field. If no code is placed into the field, 
a default code will be used. The default codes 
assigned by the compiler are loaded into the 
symbol table prior to the start of language 
translation. The programmer may declare his 



own choice of default attributes in his pro- 65 
gram. 

In addition to data attributes, nonexecut- 
able information in the source program in- 
cludes names and sizes of arrays, relative 
storage locations of identifiers (COMMON 70 
and EQUIVALENCE), formats of data in 
external files, editing specifications, informa- 
tion about the hierarchial structure of data, 
external names, Statement Function names, 
entry names, function and library routines, 75 
and communications with the operating sys- 
tem and compiler. All of this information is 
placed into the symbol table 13 by the 
(SLT) Source Language Translator 10. 

During problem transformation, additional 80 
names and facts are added to the symbol 
table. The names assigned to die storage of 
intermediate results are added. In addition, 
records may be kept of the most recent time 
at which each intermediate result has been 85 
computed in the program without a subse- 
quent change in any of the variables used in 
the computation. 

The symbol table is used by the (MCT) 
Machine Code Translator 14 module in con- 90 
junction with code skeletons to generate 
machine instructions. The machine language 
instruction sequence selected depends upon 
not only the operator involved but also the 
attributes of the operands. During this time 95 
the operands in the user's program are all 
symbol table addresses which makes it simple 
to look up the attributes. 

The MCT 14 assigns storage locations to 
each operand in the symbol table. These J 00 
addresses may be absolute or relative Rela- 
tive addressing is used, especially in multi- 
programming systems, to permit the operat- 
ing system to assign the actual storage to be 
used. 105 

Executable Statements 

At the start of problem transformation, 
executable statements are in Il^—l format 
and consist of a keyword followed by argu- 
ments each of which is followed by an EOE 110 
code (end of expression). Arguments may be 
elements or expressions. Expressions consist 
of a sequence of operand-operator pairs. 
Operands are symbol table addresses of either 
elements or expressions. H5 

Expressions may contain expressions which 
contain expressions to any level of nesting 
within the limits of the Expression Processor 
push-down stack. Each expression is termin- 
ated with an EOE (end of expression) code 120 
which causes the stack to pop up one level. 
The EOE code of the argument is at the 
first level and it causes a return of control 
from the Expression Processor to the routine 
specified by the keyword of the statement 125 
being processed. The Expression Processor 
also returns the symbol table address of the 
name assigned to the expression. The name 
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is the address at which the value of the ex- 
pression is stored. The names of expressions 
are stored in a special area of the symbol 
table so that the compiler can recognize ex- 

5 pressions within expressions by looking at the 
values of the operands all of which are sym- 
bol table addresses. 

During problem transformation, each exe- 
cutable statement is transformed into a 

10 sequence of simple instructions consisting of 
either an infix operator and two operands or a 
prefix operator and one operand. 

The keyword specifies the routine to be 
used for die transformation. The position of 

15 the argument in the sequence of arguments 
establishes how it will be transformed or 
used by the compiler. Generally, the argu- 
ments will be processed by the Expression 
Processor which generates the sequence of 

20 machine instructions necessary to compute 
the value of the argument. The address of this 
value is combined with the code skeleton for 
the statement being processed. The code skele- 
ton for an executable statement is a sequence 

25 of machine level instructions (triads like the 
output of the expression processor) with speci- 
fic locations within this sequence assigned to 
the value of each argument. This problem 
transformation module contains a code skele- 

30 ton for each executable statement. These 
code skeletons are machine-independent and 
arc not to be confused with the code skeletons 
used by the code generator to generate mach- 
ine code for a specific machine. Some execut- 

35 able statement compiler routines arrange for 
the assignment of index registers and analyze 
a larger part of the program for possible opti- 
mization. The Expression Processes optimizes 
at the low level of looking for duplicate triads 

40 which can be eliminated. The symbol table 
is used to store information needed for opti- 
mization such as the last location in a pro- 
gram at which a variable changed value. 



Conceptually, the IL — 2 code resulting from 
problem transformation is machine indepen- 45 
dent. The language transformation to a 
specific machine is accomplished by the second 
Language Translator, as MCT 14 of Fig. 1. 
The purpose of this concept is to limit the 
impact caused by a change in the object com- 50 
puter (the computer on which the program 
being compiled will be executed). This con- 
cept requires that the problem transformation 
output to the code generator be defined and 
designed to be set of machine operations 55 
which any of a class of machines can perform 
by some sequence of its own specific machine 
instructions or micro-program instructions. 

Implementation of IL-Nonexecutable 

Information 60 

The implementation of this Generalized 
Compiler may utifize current compiler sub- 
programs and techniques to the extent possible 
without compromising this separtaion of lan- 
guage translation from problem transforma- 65 
tion. The IL format for nonexecutable infor- 
mation is a symbol table, as 13 of Fig. 1. The 
symbol table can be a conventional symbol 
table. However, it requires more fields than 
the symbol table for any one language since 70 
not all languages have the same set of non- 
executable information. For example, FOR- 
TRAN does not use level numbers but Coboi 
and PL/1 do. 

Table 2, shown below, indicates a list of 75 
the information to be incorporated into the 
symbol table. 

This list can be reviewed and refined dur- 
ing writing of the compiler. It gives some per- 
spective into the size and limits of the task. 80 
It is estimated that an average program would 
have five to six hundred statements and two 
thousand identifiers. 



Group 



TABLE 2 
Symbol Table Fields 
1. Expression 

Field Contents 



General 



Identifier Name 

Starting Address of Expression 

component String 
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1. Expression (Continued) 

Field Contents 



Type 



Statement Function 
External Function 
Argument 
Triad 



Compiler Notes 



Last location in program at which 
value of expression was computed 



Group 



2. Variable 

Field Contents 



General s 



Identifier Name 
Level No. 
Sign 

Complement (-Unary) 
Protect Read/Write 
Link to Additional Information 
Scope of Definition in Program 



Scalar 

Elementary 

Label 

Group Name 
Argument 

Temporary (Compiler Use) 
Pointer (Left or Right) 
Array 
Structure 
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2. Variable 

Group 
Class (Cont.) 



Type 



Storage Allocation Info. 



(Continued) 
Field Contents 

Forward Linked List 

Double Linked List 

Tree 

Branch 

String 

Syntax Type Label 

Decimal/Binary 

Fixed /Floating-Point 

Real/Complex 

Arithmetic 

Logical 

Relational 

Condition 

Status 

Subscript Name 
Justified Left/Right 

Length (Precision) 
Length — Fractional Part 
Size in Each Dimension 
Structural Size 
Synchronized 

Contiguous to Preceding Item? 
Offset from Start of Structure 
Packed 
Filler 



10 



Group 



2. Variable (Continued) 

Field Contents 



Storage Allocation Info. (Cont.) 



Common 
Equivalence 
Initial Values 

Starting Address-Assigned Storage 
Storage Space Required 



10 



Compiler Notes 



Defined at Location 

Used at Location 

Link to Next Same Type or Level 

Link to Last Same Type or Level 

Value Changed at Location 



Group 



3. Linkage 

Field Contents 



General 



Identifier Name 

Relative Address in Program 

Link to Additional Arguments 

List of Dummy Arguments 

Return 



Type 



Entry 
External 
Function 
Sub-Routine 
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4. Constants, Literals, Character/Bit Strings 
Group Field Contents 

General Identifier Value 

Assigned Address 
Length 

Length-Fractional Part 

Type Constant (Numeric Literal) 

Character String 
Bit String 

5. Program Structure Label 
Group Field Contents 

General Identifier Name or Number 

Program Location (Relative) 
Start & End 

Statement Sequence No. 

Defined 

Used 

Nesting Level 

Type Number/Name 

Statement 
Paragraph 
Section 
Division 
Do-End Group 
Begin-End Group 
Procedure 
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6. I/O Format 
Group Field Contents 

General Identifier Name or Number 



Assigned Address 
External Format (Picture) 
Record Size 
Block Size 
Unit Assigned 

Type Format 

Picture 
Heading 
I/O Control 
File Control 
Buffer Control 

Unit Tape Unit 

Card Reader 
Punch 
Printer 
Console 
CRT 
Disk 

7. General Registers 
Group Field Contents 

General Symbolic Name 

Register Assigned 
Starting Address 
Freed at Address 
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10 



15 



Group 



7. General Registers (Continued) 
Field Contents 



Type 



Push-Down Stack 

Pointer 

Index 

Base Register 



In the symbol table, the identifiers are 
separated into groups according to the types 
of information fields needed for the identifier. 
5 The groups are: 



1. Expression name (Statement Function, 
Triad, Argument), 

2. Variable name 

3. Linkage — 

Entry, External, Function, Sub-routine 

4. Constant, Literal, Character String, Bit 
String 

5. Program Structure Label 

6. I/O Format 

7. Register Utilization 



IL Implementation — Executable Statements 

Basic Structure: The Source Language 
Translator 10 translates all executable state- 
ments of the program being compiled, into 
IL — 1, which is the higher level language 20 
format of the input to the Problem Trans- 
formation Module. The output of the Prob- 
lem Transformation Module is in a machine- 
level intermediate language called IL — 2. 

In the IL — 1 format, executable statements 25 
consist of keywords and arguments. Argu- 
ments may be either simple items or expres- 
sions. Expressions are sequences of operand- 
operator pairs. Operands may be either simple 
items or expressions. Fig. 5 shows the struc- 30 
ture of executable statements in the First 
Intermediate Language Format, IL — 1. 



The general format is: 



Where: 



Argument 
Element 

Expression 

Operand 

Operator 

EOE 



■}• 



Keyword ^argument EOE 



element expression 
symbol table address of a single item 



-^operand operator 



}. 



EOE 



symbol table address of an element 
or an expression 

code for the action to be performed 
on the related operands 

code for "end of expression" 



Keywords: Every executable statement in 
IL — 1 has as its first word a keyword. The 
keyword specifies the location of the compiler 
routine used to process the arguments that 
follow. There is one keyword in IL — 1 for 
each Source Program function for which com- 
piling capability is required. Only one routine 
in the G. C. is required for any one function 
even though each source language may state 
the function in a different manner. 



For example, the Cobol statement, "add A 
to B giving C" and the Fortran statement, 
"C=A+B" both specify the same function, 
namely the assignment function. Therefore, 
regardless of the way the function is origin- 
ally stated, the same routine can process this 
task once it is translated into the standard 
G. C. input format, IL — 1. Table 3 indicates 
the list of keywords in IL — 1, the First 
Intermediate Language. 



BNSDOCID: <GB 1 36774 1A_i_> 



14 



1,367,741 



14 



TABLE 3 
IL Keywords (Higher Level IL) 



IL Keyword 



Additional source language keywords 
which map into the IL Keyword and its 
arguments. 



Storage Control 

ALLOCATE 
FREE 

Operator Instructions 

DISPLAY 
Conditions 

ON 

SIGNAL 
REVERT 



REPLY, EVENT, PAUSE, ACCEPT 



SNAP, SYSTEM 



Program Control 

DELAY 

STOP 

EXIT 



WAIT 
END, RUN 



1/0 Commands 



OPEN 



OPEN (Cont.) 



CLOSE 



FILE, DIRECT, SEQUENTIAL, 
BUFFERED, UNBUFFERED, STREAM, 
RECORD, INPUT, OUTPUT, KEYED, 
EXCLUSIVE, BACKWARDS, TITLE, 
PRINT, LINESIZE, PAGESIZE. 
LOCK, REWIND 
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TABLE 3 

IL Keywords (Higher Level IL) 

Additional source language keywords 
which map into the IL Keyword and its 
IL Keyword arguments. 



1/0 Commands (Cont.) 



UNLOCK 
LOCATE 

GET 



UPDATE 



PUT 



GENERATE 



FILE, SET, KEYFROM, FIND, REWIND 
BACKSPACE. 

READ, FILE, INTO, SET, IGNORE 
KEY, KEYTO, COPY, SKIP, LIST, 
EVENT, NOLOCK, NAMELIST, 
ADVANCING, REWIND, RECORD, INVALID. 
FILE, KEY, EVENT, DELETE, REWRITE 
FROM. 

WRITE, FILE, PAGE, SKIP, LINE, 
DATA, EDIT, LIST, FROM, KEYFROM, 
EVENT, ADVANCING, PUNCH, PRINT. 
INITIATE, TERMINATE, REPORT. 



Problem Execution 



ASSIGNMENT 



DO 

GO TO 
ALTER 



(element, array, structure,) COMPUTE, 

ADD, SUBT., MULT., DIV., BY NAME, 

ROUNDED, ON SIZE ERROR, GIVING, INTO 

FROM, CORRESPONDING. 

WHILE, TO, BY, PERFORM, UNTIL, TIMES, 

THRU, VARYING, FROM, AFTER, FOR, 

DEPENDING ON 

TO PROCEED TO, ASSIGN. 
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TABLE 3 

IL Keywords (Higher Level IL) 

Additional source language keywords 
which map into the IL Keyword and its 
IL Keyword arguments. 



Problem Execution (Cont.) 



IF 



MOVE 
EXAMINE 

TRANSFORM 

CONCATENATE 

SEPARATE 

CONTINUE 

CALL 

RETURN 



THEN, ELSE, IS NOT, NUMERIC, 
ALPHABETIC, POSITIVE, ZERO, 
NEGATIVE, NEXT SENTENCE 
GET, PUT, CORRESPONDING. 
REPLACING, TALLYING, ALL, 
LEADING, UNTIL, FIRST. 
CHARACTERS, FROM, TO. 



TASK, EVENT, PRIORITY, ENTER, LINKAGE 



Operators: To meet the specification that 
IL — 1 be source language independent, the 
IL must describe all of the functions which 

5 can be described by any source language of 
practical interest. Such a task can be hope- 
lessly difficult, resulting in an excessively 
large compiler if every source language fea- 
ture of every language is treated as a separate 

10 function. It is important to analyze the source 
language features to determine what functions 
are actually involved. 

The most elementary functions involve ele- 
mentary operations to be performed on oper- 

15 ands. These actions are specified by operators. 
For the IL to be language independent, the 
G. C. set of operators must encompass the 
operator sets of the various source languages of 
interest. From a practical standpoint, three 

20 source languages, Cobol, Fortran IV, and 
PL/1 have been considered in the develop- 
ment of the G. C. Refinements, such as 
the addition of a special operator from some 
other source language, can be easily evalu- 

25 ated. 

In addition to the conventional operators 
from the above source language which specify 
actions on data, delimiters, needed in IL, are 



included. They are considered to be operators 
(control operators) since they also specify 30 
action to be taken. 

The list of operators in the G. C. also in- 
cludes three new operators created for imple- 
mentation of IL. They are the prefix opera- 
tors: .A., J., and. .O. . All of these operators 35 
relate to the addresses of the data involved. 
The first operator, .A., specifies the operation 
of determining the address of the associated 
operand value as the address of the desired 
data. The third operator, .O., specifies that 40 
the operand address is an offset address to 
which a base address is to be added. 

With these new operators, it is felt that the 
number of source language functions which 
can be described clearly and simply without 45 
special routines is considerably increased. For 
example, PL/1 has the "Pointer" and "Off- 
set" features which are classified as special 
data types. The IL address operators can 
clearly specify and process each of these 50 
through the Expression Processor by treating 
the address as a functioni of regular data. 
Likewise, list processing source languages 
create linked lists which can easily be trans- 
lated into IL using the address operators. 55 
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Table 4 shows a list of the operators in IL (left to right) basis. As an exception to this 

and the precedence relationships. For opera- rule, Precedence Level group 2, the prefix 

tors in the same precedence group, the order operators, are processed in a right to left 

of occurrence in an expression establishes the order, 
precedence in a first come, first executed 

TABLE 4 
IL Operators and Precedence 

Precedence 

Level IL Operators Description 

0 EOE ) , .O. CONTROL & OFFSET 

1 ** EXPONENTIAL 

2 + Prefix -Prefix .not. "1 .A. J. PREFIX 

3 * / ARITHMETIC 

4 . + ARITHMETIC 

5 II CONCATENATION 

6 >> = ^ ^ < RELATIONAL 

7 & BIT STRING "AND" 

8 I BIT STRING "OR" 

9 -and. LOGICAL "AND" 

10 .or. LOGICAL "OR" 

11 (FN (Function Code) SB (Subscript Code) CONTROL 



EOE( ) ,0. , FN SB) ARE ALL CONSIDERED TO BE CONTROL OPERATORS. 
THEY ARE NOT A PART OF THE OUTPUT CODE. 

Operands: Operands are the symbol table language translator adds a special code which 

addresses of the identifiers in the user's pro- means "no operand" whenever necessary (pre- 

gram. Identifiers which are the names of ex- ceding the first operator or in between two 35 

pressions are kept in a separate area of the consecutive operators). 

15 symbol table so that the Expression Processor In case the precedence of operations in a 

can recognize expressions within expressions source language is different from the prcce- 

with a minimum of effort. dence conventions adapted for the IL, the 

Expressions: Expressions in IL — 1 format. Source Language Translator for that language 40 

the input to the Problem Transformation must insert parentheses as necessary into the 

20 Module, are simply, nearly one-to-one trans- source program expressions during translation 

formations of source language expressions. into IL unless provision is made in the G. C. 

Operand names are replaced by the symbol to accept a user definition of the precedence 

table addresses of the names. Operator codes rules. 45 
of the source language are replaced by the Arguments: The arguments in IL — 1 re- 

25 operator code equivalent in IL. In order to present the information content of the execut- 
avoid duplication of the language translation able statements of the user's program. Inas- 
task of symbol recognition in the Problem much as IL is not exposed to the program- 
Transformation module 12, the expression mer except during the writing of the com- 50 
structure is designed so that every odd num- piler, most of the redundancy common in 

30 bered word in any expression is an operand source languages has been removed. An argu- 

and every even numbered word is an opera- ment may be a single value as a DO loop 

tor. To accomplish this format, the source parameter or it may be an expression such as 
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the expression to the right of the equals sign 
in an assignment statement or an expression 
which represents the value of a parameter. 
EOE: EOE is the symbolic name used to 

5 represent the internal code used in IL — 1 to 
delimit (delimit=to separate) expressions and 
arguments. The code used is one of the codes 
not assigned for character representation in 
either the ASCII or EBCDIC code. This code 

10 is used to pop-up the push-down stack of the 
Expression Processor to the next level at the 
end of processing in expression and to ad- 
vance the statement processing routine ^ to 
the next argument at the end of compiling 

15 an argument. 

Statements: Executable statements in a 
source language specify mappings or trans- 
formation to be performed on data during 
execution of the user's program. These trans- 

20 formations are functions which are indepen- 
dent of the source language is specified. The 
Source Language Translator 10 transforms 
the functional values and expressions of the 
source statement into the respective argu- 

25 ments of an IL — 1 statement. The arguments 
in an IL — 1 statement are the values or ex- 
pressions to be processed by the Problem 
Transformation module, 12. 

For example, the source language statement 

30 "GO TO 101" specifics the unconditional 
branch function in Fortran. In some other 
source language, the keyword might be 
"BRANCH TO" P. (or any other symbol) 
without affecting the function to be per- 
formed. Likewise, the identifier might be an 

35 alphanumeric name or an expression without 
changing the function. 



Some complicated functions, which can be 
expressed in one sentence, are equally speci- 
fied by any one of several dif- 
ferent sets of arguments. For ex- 40 
ample, the test for continuing the 
iteration in a DO loop may be specified as 
the upper limit of the loop variable or it may 
be specified as a test of a logical expression 
in which case iteration continues for as long 45 
as the value of the logical expression is "true". 
It seems more practical to handle this type 
of situation in IL — 1 by providing argument 
positions in the IL — 1 statement for each 
argument of each alternative rather than im- 50 
posing a more complicated language transla- 
tion from source language. The Problem 
Transformation Module 12 recognizes each 
argument position by the EOE code which is 
present even for argument positions which 55 
have no argument. Error detecting logic is 
used to insure the presence of a suitable set 
of arguments. Tables 5(a) and 5(b) are ex- 
amples of FORTRAN and COBOL source 
language iteration statements expressed in a 60 
newly developed Metalanguage which is de- 
scribed subsequently hereinafter. Table 6 
shows the PL— 1 source language iteration 
statement in the Metalanguage notation, while 
Table 7 shows, in Metalanguage notation, the 65 
single, generalized. First Intermediate Lan- 
guage (IL — 1) iteration statement into which 
each of the Fortran, Cobol, and PL/1 state- 
ments of Table 5 and 6 map. 

Tables 8(a) and 8(b) illustrate how the 70 
iteration statements from FORTRAN and 
COBOL map into this single IL — 1 statement, 
in Metalanguage notation. 
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TABLE 5 (a) 

FORTRAN Iteration Statements Written In New Metalanguage 



flstrnti DO a s tmt2 ilO 



_ fkipn fkip2*| pjdp3-, 
\ipl J lip2 J Ui P 3 J 



f^p41 fkip5T r^ip6 n 
astmts DO a E tmt 4 ill = i >> < > 

Lip4 J Lip5 J L,ip6 J 



aatmt5 DO a s tmte i!2 



_ fkip7"| fkipS*^ p,kip9-j 
UP7 J Lip8 J L,ip9 J 



a s tmt6 — 



(executable program statements) 



3stmt4 ~~ 



a,qtrat 2 — 
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TABLE 5 (b) 

COBOL Iteration Statements Written In New Metalanguage 
PERFORM aproci jjTHRU a pr oc 2 J [^iq 0 } TIMEs] 

PERFORM aprod ]tHRU a pr oc 2 ] UNTIL Bl t 
t —,-„..- —i i 1 1 

PERFORM aproci £THRUa pr0 c 2 ] VARYING vnl FROM 

fknl 1 fkn2 1 -r fkn3 ^| 

1 > BY < y UNTIL El! AFTER vn2 FROM ^ V 
[vn4 J tvnS J -± t vn6 J 

fkn4 1 -r fkn5 1 

BY ^ ^ UNTIL El 2 AFTER vn3 FROM 1 Y 

tvn7j -I [ynS J 

BY {vn9 } 

UNTIL El,] 



TABLE 6 

PL — 1 Iteration Statements Written In New Metalanguage 
al: DO vnl = mnl TO mn2; 



a2: END [al] ; 



al: DO ; 

a2: END [al] ; 



al: DO WHILE NOT El 2 ; 
a2: END [al] ; 



al: DO vnl = mnl [TO mn2 [BY mn3]] 
[WHILE NOT El J ; 

a2: END [al] ; 
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TABLE 7 

IL — 1 Iteration Statement Written In New Metalanguage 
DO cl al a2 vnl mnl mn2 mn3 El x 
[a3 a4 vn2 mn4 mn5 mn6 El 2 
[a5 a6 vn3 mn7 mn8 mn9 EI 3 ]] 
WHERE: 



cl if zero, test before iterating. If one, iterate 
before testing. 



fal, a2, a3, 1 
|^a4, a5, a6, J 



arc statement sequence numbers assigned by the 
source language translator or symbol table 
addresses of statement, paragraph or section 
labels. If al, a3, a5 specify paragraphs or sec- 
tions, a2, a4, and a6 are blank. 



vnl, vn2, vn3 specify DO loop variables. 

mnl, mn4, mn7 initial values of DO loop variables. 

mn2, mn5, nm8 the limiting values of the DO loop variables. 

mn3, mn6, mn9 the value by which the DO loop variables are 
incremented (or decremented, if negative) after 
each iteration 

El l5 El 23 El 3 logical expressions which are tested before each 
iteration. Iteration continues until expression 
is true. 



MAPS 

SOURCE 

FORTRAN 

a stmt -i 

kipl thru kip9 

ipl thru ip9 

ilO 

ill 

il2 



TABLE 8 (a) 
Iteration 



IL— 1 
a — i 

mnl thru mn9 

mnl thru mn9 

vnl 

vn2 

vn3 
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TABLE 8 (b) 
Iteration 

MAPS 



SOURCE IL— 1 



COBOL 



aproci al 

3proc2 a2 

kiplO fmnl = 1 

\mn2 = kiplO 

iplO fmnl — 1 

\mn2 = iplO 

vn4 mnl 

vn5 mn3 

vn6 mn4 

vn7 mn6 

vn8 inn? 

vn9 ttitiQ 



IL — 2 Format 

The output of the Problem Transformation 
Module 12 consists of executable instructions 
5 in IL — 2. Unlike IL — 1, which is in higher 
level language, IL — 2 is at the machine opera- 
tion lever which means that the translation to 
a specific machine code from IL — 2 is ap- 
proximately one-to-one. The Problem Trans- 
it) formation Module 12 transforms each state- 



ment from IL — 1 format into a sequence of 
instructions each of which consists of a macro 
call or an operation code and the one or two 
operands involved in this operation. Table 8(c) 
is an example of the IL — 2 skeleton from 15 
which the IL — 2 instruction sequence will be 
generated by the Generalized Compiler as per 
the sequence of tasks listed in Table 8(d) as 
the mapping of the IL — 1 statement of Table 
7. 20 
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TABLE 8(c) 

IL — 2 Instruction Sequence Table for IL — 1 
Iteration Statement (Code Skeleton) 

STMT STMT 

NO ADDRESS INSTRUCTION 

1 vnl = mnl 

2 vn2 = mn4 

3 vn3 = mn7 

4 GO TO a7 

5 al CONTINUE 

6 IF vnl >mn2 GO TO a2 

7 IF vnl <mn2 GO TO a2 

8 IF vnl >mn2 AND mn3> O OR 

vnl<mn2 AND mn3<0 GO TO a2 

9 a3 CONTINUE 

10 IF vn2>mn5 GO TO a4 

11 IF vn2<mn5 GO TO a4 

12 IF vn2>mn5 AND mn6>0 OR 

vn2<mn5 AND mn6>0 GO TO a4 

13 a5 CONTINUE 

14 IF vn3>mn8 GO TO a6 

15 IF vn3>mn8 GO TO a6 

16 IF vn3>mn8 AND mn9>0 OR 

vn3<mn8 AND mn9<0 GO TO a6 

17 IFE1JGOT0 32 

18 IF El 2 GO TO a4 

19 IF El 3 GO TO a6 

20 a7 CONTINUE 

21 <statements to be iterated from a5 to a6> 

22 vn3 = vn3 + mn9 
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TABLE 8 (c) (Com.) 

STMT STMT 

NO ADDRESS INSTRUCTION 

23 GO TO a5 

24 a6 CONTINUE 

25 < statements to be iterated from a3 to a4> 

26 vn2 = vn2 -f mn6 

27 GO TO a3 

28 a4 CONTINUE 

29 < statements to be iterated from al to a2> 



30 
31 

32 



a2 



vnl = vnl + mn3 
GO TO al 



Table 8(d) shows the tasks performed by 
the Generalized Compiler GC-— 11, with re- 
spect to mapping the IL — 1 iteration state- 
5 ment of Table 7. 

Table 8(d) 
Generalized Compiler Tasks 
(Statement numbers refer to statements in the 
IL — 2 Instruction Sequence Table) 

10 Task 

1. <If al is blank 5 go to task 30 (error 
exit)> 

2. <If a2 is blank, al is the symbol table 
address of the procedure to be iterated. 

15 Look up the a2 field in the symbol table 
word for this procedure. Set up the code 
to call this procedure (not shown in the 
IL — 2 instruction sequence) > 

3. <If a2 is given, look up in the symbol 
20 table to determine if a2 is a procedure. 

If it is, replace the procedure address 
specified as a2 with the a2 field in the 
symbol table word for this procedure. 
Set up the code to call these procedures 
25 (not shown in the IL — 2 instruction 
sequence)> 

4. <If mnl is blank and mn2 is specified, 
assign a value of 1 to mnl> 



5. If a3 is blank, set a3 equal to al, a4 
equal to a2, and delete statements 9, 25, 30 
27, 28 > 

6. <If a5 is blank, set a5 equal to a3, a6 
equal to a4, and delete statements 13, 

21, 23, 24> 

7. -<If mn2 and mn3 are both blank, delete 35 
statements 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 

11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

22, 26, 30, 31 and go to task 29. This 
is a single iteration regardless of whether 

El i is specified > 40 

8. '<If cl is zero delete statements 4, 20> 

9. <If EIt is blank, delete statements 17, 
18, and 19 and go to task 11 > 

10. ;<If mnl is blank, delete statements 18, 
19> 45 

11. <If mnl is blank, delete statements 1, 2, 
3, 6, 7, 8, 10, 11, 12, 14, 15, 16, 22, 
26, 30, and go to task 29> 

12. <If mn3 is blank, assign the value 1 

to it> 50 

13. <If mn3 is a positive literal, delete 
statements 7 and 8 and go to task 16> 

14. <If mn3 is a negative literal, delete 
statements 6 and 8 and go to task 16> 

15. '< delete statements 6 and 7> 

16. <If El 2 is blank, delete statements 18, 
19 and go to task 18> 

17. '<If mn4 is blank, delete statement 19> 



25 



1,367,741 



25 



18. <If mn4 is blank, delete statements 2, 
3, 10, 11, 12, 14, 15, 16, 22, 26, and go 
to task 29 > 

19. :<If mn6 is blank, assign the value, 1, to 
5 it> 

20. <If mn6 is a positive literal, delete state- 
ments 11 and 12 and go to task 23 > 

21. <If mn6 is a negative literal, delete 
statements 10 and 12 and go to task 23 > 

10 22. < delete statements 11 and 12> 

23. <If Els is blank, delete statement 19 > 

24. If mn7 is blank, delete statements 3, 14, 
15, 16, 22 and go to task 29> 

25. ,<If mn9 is blank, assign a value of 1 
15 to it> 

26. <If mn9 is a positive literal, delete state- 
ments 15 and 16 and go to task 29 > 

27. <If mn9 is a negative literal, delete 
statements 14 and 16 and go to task 29 > 

20 28. < delete statements 14 and 15 > 

29. < generate IL — 2 output code sequence 
for the statements not deleted > 

30. •< error exit> 

General Characteristics of the Intermediate 
25 Language 

Table 9 lists the characteristics of the IL. 

Table 9. 

Characteristics of the Intermediate Language 

I. It is a source language independent. 
30 2. The input, IL — 1, is function-oriented. 

3. Executable input statements, in IL — 1, 
are in higher level language format. 

4. There are no data declaration statements. 
All data names and attributes are in a 

35 symbol table. 

5. All data names not explicitly defined 
automatically assume default attributes. 

6. All names (operands) are symbol table 
addresses. 

40 7. Operator codes and punctuation codes 
are sufficient to describe all generally 
used source language operations. 

8. Every executable input statement starts 
with a standard keyword for the function 

45 to be executed. 

9. It is independent of the object computer. 

10. The output, IL — 2, is computer operation 
oriented. 

II. IL — 2 output code is at the machine 
50 operation level 

Tables 5, 6, 7 and 8 provide an example 
of the transformations involved using IL. 
Tables 5 and 6 show various source language 

55 statements used in FORTRAN, COBOL and 
PL — 1 to specify the iteration function. Tabie 
7 shows the IL — 1 format which can represent 
any of the various source statements shown 
in Tables 5 and 6. 

60 In other words, a common iterative function 
can be specified by a FORTRAN "DO" loop 
or a COBOL "PERFORM" statement or a 



"DO END" group of PL/1. Therefore, 

any of the above source language formats for 
iteration can be easily translated by SLT 10 65 
into a single format, IL— 1, and this standar- 
dized statement can be processed by a Gener- 
alized Compiler GC — 11. This Generalized 
Compiler GC — 11 may be designed as a 
module such as the ALU of the host com- 70 
puter or it may be designed as a piece of 
peripheral equipment or a set of subroutines. 
Such a design may be economically feasible 
in hardware because of the wide application 
of the same compiler to many computers and 75 
many source languages. In this example for 
iteration, the intermediate language input for- 
mat, IL- — 1, at the input to the GC— 11 is 
shown in Table 7. 

Where: The letters spacify dummy van- 80 
ables to be replaced by scope of definition, 
indexing labels, index range limits, increments 
and conditions. The output of the GC — 11 is 
LI— 2 and consists of machine level instruc- 
tions (such as: compare, increment, branch, 85 
and test) inserted in the proper sequence and 
locations in the text of the problem program. 
Although the output is at the machine code 
level, it is function-oriented and machine in- 
dependent. The translation of this output into 90 
a specific machine code is, in most cases, a 
simple task of table look-up. Table 8 shows the 
IL — 2 output instructions into which the IL— 1 
statement of Table 7 is transformed dur- 
ing problem transformation. In addition, the 95 
compiler system may add instructions to save 
and restore the contents of index registers as 
needed as other supporting operations. 

Advantages of the Computer System Using 

Generalized Compiler Concept 100 
The advantages of this computer system are 
primarily the result of organizing the work 
into independent decentralized functions and 
isolating these functions. As a result, the fol- 
lowing advantages arc obtained These arc 105 
briefly outlined first and then discussed in 

Advantages : 

1. Compiler design cost is less because the 
compiler is useful over a broader range 

of application resulting in economies of 110 
volume. 

2. Compiler design can be optimized and 
refined by iteration of the results of initial 
design because the broader use and longer 
h'fetime produce sufficient returns from H5 
even a small improvement. 

3. Compiler is simpler to design and debug 
because it is organized into independent 
decentralized modules. 

4. Compile time can be substantially reduced 120 
by implementing the design in hardware- 
firmware. 

5. Operating system complexity can be re- 
duced because of the reduced demands on 

the system by the compiler, 125 
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6. Source language development is simplified. 

7. Better language features will permit more 
program optimization by the programmer. 

8. It is compatible with the development of 
5 common data base systems and other cur- 
rent projects which may radically alter 
compiler design specifications. 

Cost: The current compiler design ap- 
proach requires a separate unique compiler 

10 design for each source language/machine com- 
bination. Each compiler has relatively little 
utilization and a short lifetime. In compari- 
son, a large part of a generalized compiler can 
be used without change with numerous source 

15 languages and various makes and generations 
of computers. Therefore., the cost of the de- 
sign is distributed over a larger number of 
units and is proportionately less. 

Compiler Improvements: Because of the 

20 complexity of current compilers and their 
limited applications, once a compiler design 
is debugged, the design effort generally ceases. 
Changes are strongly resisted because of the 
risk of creating unsuspected new problems. In 

25 addition, it is not justified to redesign in 
order to overcome disadvantages discovered 
during operation because the potential im- 
provement will benefit only a relatively small 
user group. In comparison, extensive analysis 

30 of the Problem Transformation Module of the 
Generalized Compiler during use is justified 
because it performs standard functions com- 
mon to all compilers and the benefits of any 
improvement will be widespread. 

35 Simplicity : The concept of separating the 
independent parameters of language transla- 
tion and problem transformation into separate 
modules, in effect, decentralizes the control 
from one complicated control and flow se- 

40 quence into three independent simple tasks. 
As a result, design details do not impact the 
whole compiler but are clearly limited. There- 
fore, designing and debugging are easier. 
Speed: Because the Problem Transform a- 

45 tion Module is independent of both source 
language and object computer, it can be used 
with many source languages and many com- 
puters. It will not change with changes in 
either source language or computer machine 

50 codes. 

Therefore, it is feasible to design this unit 
in hardware-firmware providing a potential 
speed improvement of several orders of mag- 
nitude. This portion of the compiling task 
55 can be performed at nearly machine cycle 
speeds. 

Reduced Supervision: Operating System 
complexity and uie resulting time spent by the 
computer for operating system duties (non- 
60 productive expense) are becoming major prob- 
lems. As languages become extended, com- 
pilers become larger and require more operat- 
ing system effort to schedule and to switch 
to and from active storage especially in a 
65 multi-programming environment. 



In the Generalized Compiler, the language 
translators (SLT 10 and MCT 14) are much 
smaller and therefore easier to schedule and 
switch. The scheduling of the ALU is greatly 
reduced especially if the Problem Transfor- 70 
mation Module is implemented in hardware. 
In addition, the switching and active storage 
needs are reduced. It may now be feasible 
to perform the entire compile task in peri- 
pheral equipment, further simplifying the 75 
operating system. 

Source Language Improvements: In the 
Generalized Compiler, the Source Language 
Translator is the only module affected by a 
change in the source language. The transla- 80 
tion in this module is a simple approximately 
one-to-one translation into intermediate lan- 
guage. Therefore, source language changes to 
better adapt the language to the user are 
relatively easy to implement. Likewise, the 85 
impact on the compiler is strictiy confined 
which means that the risk of adding unsus- 
pected errors is proportionately reduced. 

Better Optimization : Small language trans- 
lators, as in the Generalized Compiler, are 90 
easier to transfer into and out of active stor- 
age, and consequendy source program switch- 
ing from one language to another within a 
program may be practical. This would allow 
the programmer more flexibility to optimize 95 
either the writing or execution of a program. 
Since the programmer knows more than the 
compiler designer about the context of the 
problem being programmed, the programmer 
is in the best position to optimize the pro- 100 
gram globally. Local optimization, on the 
other hand, is probably best done by the 
compiler. 

Reduced Obsolescence : Because the Gener- 
alized Compiler is divided into independent 105 
modules, the impact of a change is strictiy 
limited. Likewise, it can be adapted to radi- 
cal changes. For example, the work now 
being done on data base management and a 
common data base for a community of users HO 
may lead to the separation of compiler tasks 
into two specialized compilers* one of which 
compiles data attributes and environment 
information, and the other which compiles the 
executable procedures. The Generalized 115 
Compiler is not only compatible and adapt- 
able to this concept, but may accelerate the 
development. 

Description : New Metalanguage 
From general considerations, it was felt 120 
desirable to provide: 

(a) Separation of data attribute declaration 
from executable statements into a separate 
compiler function to provide a run-time sym- 
bol table or a data base so that data attri- 125 
butes can be taken into consideration during 
program execution. In other words, make 
provision in the compiling process to handle 
the common data base. 
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(b) Develop the Generalized Compiler 
concept of IL — 1 so that simple translators 
can be designed for each source language. 

(c) Develop a software writer's interme- 
5 diate language, IL — 2, so that operating sys- 
tems and compilers can be written and de- 
bugged in a simpler fashion by using an in- 
termediate language. 

In order that this might be done, it was 

10 felt necessary to develop a new Metalanguage 
in order to be able to precisely define the 
elements of various source languages. Since a 
good precise definition of a "problem" is the 
best basis of a good "solution", it is necessary 

15 that there be provided a new Metalanguage 
which can be used as a tool to organize and 
simplify complex higher level language in- 
formation. Existing metalanguages either fail 
to specify this information precisely or fail to 

20 be concise. If the source language statements 
are not specified, both precisely and concisely, 
the information content and, therefore, the 
mapping requirement will not be properly 
comprehended. 

25 Thus, in order to develop, from a source 
language, a first intermediate language 
(IL — 1) suitable for operation with a Gener- 
alized Compiler - - and for developing a 
second intermediate language (IL — 2) which 

30 is suitable for easy conversion into the object 
machine code for any specific type of com- 
puter it was felt necessary to develop a pre- 
cise method of handling and defining the in- 
formation content of all the major source lan- 

35 guages. The new metalanguage is a tool which 
aids in recognizing identical information 
among various source languages, by systema- 
tically labeling the information content of 
each source language statement, so that iden- 

40 tical information units from various source 
languages can be mapped into the same IL — 1 
function. 

Taking into consideration the problem of 
man-machine communication: between source 

45 languages (such as COBOL and FORTRAN) 
and software writers it is seen that there is 
required some sort of communication link 
between these two elements. 

Metalanguage, which is here developed as 

50 the notation used to specify elements in 
source languages, is the communication link 
between the man (the software writer) and 
the machine (the source language). Possibly 
a major cause of the chronic failure of soft- 

55 ware to be delivered on time and to perform 
to specifications, is the rush of software people 
to go to work on the final product while 
ignoring the importance of developing good 
tools for the job. Such good tools would pri- 

60 marily be good notation (Metalanguage) and 
good documentation. Many poor software 
developments are the direct result of the 
failure to properly evaluate jobs due to defi- 
ciency of notation or adequate tools for deal- 

65 ing with source languages. 



Imprecise or poor notation, not only fails 
to clarify the requirements, but also creates 
misconceptions about the complexities and 
relative importance of various components of 
the task to be done. Thus, poor working pro- 70 
grams often are the result of poor communi- 
cation and poor communication is very often 
the result of poor notation (poor Metalan- 
guage). Thus, good notation is extremely im- 
portant in providing insight and good judg- 75 
merit for the software developer. 

A Metalanguage, to be a precise and useful 
tool, should be designed so that it can pre- 
cisely describe a language at each hierarchical 
level, in terms of the next lower hierarchical 80 
level. This approach is required to provide 
insight into the context of the problem by 
eliminating the saturation "noise" from lower 
levels in the hierarchy. 

It is of great importance to recognize the 85 
information content of the statements in the 
source languages so that the intermediate lan- 
guage (IL) will preserve the information con- 
tent and so that identical information content 
from different source languages can be recog- 90 
nized as such and mapped into the same in- 
termediate language statement without re- 
dundancy. 

One of the Metalanguages curremly in com- 
mon usage to describe computer languages is 95 
called Backus Normal Form, BNF. Although 
it appears to be useful for simple expressions, 
it is not practical to use BNF to describe 
complex higher level languages. In BNF 3 a 
pair of < > is used to delimit each generic 100 
term, and furthermore, a full word is gener- 
ally used for each such term. This constitutes 
notational clutter and excess verbiage. The 
structure of the word used gives no indica- 
tion of the amount of information involved. 105 
For example: 

the notation, < integer > is "larger" than 

<term> although term refers to a much 

larger unit of information. 
Thus the hierarchial levels of information no 
cannot be easily labeled nor identified. 

A precise notation can be an effective tool 
to aid in analysing languages to determine 
the actual information specified. To be useful, 
the Metalanguage must be capable of speci- H5 
fying the language-being-analyzed concisely 
and precisely, and it must be easy to learn 
and use. The Metalanguage structure itself 
should assist by conveying a part of the in- 
formation. Thus the following requirements 120 
are stated as the specifications for a Meta- 
language; 

1. The meaning must be unambiguous. 

2. It must be capable of describing higher 
level languages precisely and concisely. 125 

3. It does not bury the major information 
units in a mass of detail which constitutes 
"noise" to the user. 

4. It is capable of specifying information 

at each hierarchical level in terms of the next 130 
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lower level of information units. 

5. It is useful as a tool to provide insight 
into the true nature of the problem involved. 

Newly Developed Metalanguage (ML) 

5 The Metalanguage newly developed is a 
set of symbols and rules designed to permit 
the description of statements and specifica- 
tions of the syntax and the semantics rules 
of various higher level languages such as 

10 COBOL, FORTRAN, etc. in a precise and 
a concise manner. In addition, it is designed 
to minimize notational clutter and confusion 
so that it can be used as a tool in analyzing 
the information content of various languages. 

15 The notational structure itself conveys infor- 
mation about the: (a) magnitude and (b) 
importance, of the information being repre- 
sented. The notation is designed to be easy 
to learn and easy to remember. Figures 8 and 

20 9 show language concepts organized into a 
hierarchy of magnitude and importance with 
the use of special rotation to differentiate the 
heirarchy and meaning of concepts. 

Thus, a single lower case letter (Fig. 9) 

25 is used to represent a primitive information 
unit or element (any one of a set of similar 
elements). The single lower case letter with 
a lower case subscript or suffix represents a 
subset of the set represented by the letter. 

30 Two major subsets of all information, elements 
are "identifiers" and "operators". 

It should be stressed that this described 
Metalanguage notation is exemplary only 
with respect to the symbols and scope of 

35 definitions. Other symbols and defkutioa- 
scape could be used to describe the same pre- 
cise method of organizing a hierarchy of con- 
cepts. 

Combinations or strings of elements are 
40 specified by a single capital letter. Thus, as 
seen in Figures 6 and 8, the letter "E" is 
used to specify one member (any member) 
of the set of all expressions. Subsets are speci- 
fied by subscripting lower case letters or num- 
45 bers to the single capital letter as seen by M x 
or Ei. 

Subsets are also specified by suffixing lower 
case letters to the single capital letter. Thus, 
"Eln" (Fig. 8) is any member of the subset 
50 that includes all logical numeric expressions. 
In addition to the information conveyed 
by the structure (whether the letter is upper 
case or lower case) the letters assigned to 
various sets are selected for ease of learning 
55 and association with the set referred to. Thus, 
"E" is for expressions; "e" is for primitive 
elements; "o" is for operators, etc. 

All of the above notation is for specifying 
"generic" sets of information. In order to 



specify a "particular" member of any set, the 60 
following rules apply : When a specific mem- 
ber is specified, it is considered to be a 
literal of the Metalanguage and it is written 
entirely in capital letters, if it is an identifier 
of 2 or more letters such as a reserved word. 65 
Digits and source language special symbols 
are written without change or extra notation. 
Any source language letters appearing in a 
literal are written as capital letters. 

Since the Metalanguage symbols are not 70 
the same as those used by the source lan- 
guages, there is no difficulty in distinguishing 
the source language symbols from Metalan- 
guage symbols. The only exception is the 
three dot (. . .) repetition symbol which is a 75 
Metalanguage symbol also used as a JOVIAL 
language symbol. Because this symbol is so 
convenient and because JOVIAL is not a 
commonly used language, it is felt that when 
JOVIAL repetition notation is needed it can 80 
be specified by a sub-scripted h(h . . .). 

Since the major source languages of interest 
do not use lower case letters, the Metalan- 
guage lower case letters and combinations of 
one capital letter combined with one or more 85 
lower case letters, are easily identified as 
Metalanguage. Source language words of one 
letter or character which can be confused 
with the upper case Metalanguage notation 
(or the repetition symbol in JOVIAL) are 90 
specified by the lower case letter h with a 
subscript consisting of the character being 
specified (either a capital letter or digit or 
repetition symbol of or other symbol). Thus, 
the source language letter "A" (Metalanguage 95 
literal) is thus specified as "h A '\ 

One feature of BNF which has been in- 
corporated into the new Metalanguage is the 
use of English language which is enclosed in 
corner brackets < > to describe language 100 
requirements. This permits a deferment of 
the design of actual code for a language until 
after all the specifications have been deter- 
mined. 

The following tables 10 through 13 show 105 
a preliminary partial assignment of the let- 
ters of the alphabet to sets and subsets and 
the Metalanguage symbols. 

Table 10 hereinbelow shows the notation 
used in the Metalanguage with definitions HO 
thereof. 

Table 11 provides definition of lower case 
letters of the alphabet and examples of use 
in the Metalanguage. 

Table 12 (a) defines information element 115 
sets, while table 12 (b) shows the subset of 
various identifiers. 

Table 13 defines the subsets for various 
operators. 
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TABLE 10: Metalanguage Notation 




square brackets — used to enclose an optional part of the format. 



Braces — designate a choice among alternatives listed vertically. One 
of the alternatives must be chosen unless one of the choices is under- 
lined designated it to be the choice by default. 



underline — specifies default alternative. 

| vertical bar — separates alternatives. 

Used in simple definitions. 

< > Corner brackets — encloses English language used to describe or name 
a source language or intermediate language element, structure or speci- 
fication. 

u <i> the number, i, inside the corner brackets specifies the number of con- 
secutive occurrences of the unit, u. 

. . . repetition symbol — specifies that the immediately preceding unit 
(element, structure, or bracketed group) may occur a number of times 
in succession. 

XX. . . metalanguage literal — source language word of two or more characters 
with all letters capitalized, or else a source language special sumbol 
(character other than a letter or digit). 

hx metalanguage literal specifying one source language character. It is 

specified by a lower case letter, h, subscripted by the character being 
specified. If the character is a letter, it must be shown as a capital letter. 
When no possibility of confusion or ambiguity results, the source 
language symbols can be written as they occur without subscripting an h. 
The only exception known at present is the repetition symbol (. . .) 
which is a legal symbol in the Jovial language. 

Specification of this symbol for Jovial will be by use of the subscripted 

Mh.:.). 

X single capital letter — metalanguage notation for a generic term specifying 

a member (any member) of a set of multi-element information units such 
as statements (specified by the capital letter, S), and expressions (specified 
by the letter, E). 

Xx. . . capital letter followed by one or more lower case letters — specifies a 
multi-element member of a subset of the set specified by the capital 
letter. For example, Eln is the generic name for a member of the subset, 
Logical Numeric Expression, 

x single lower case letter — metalanguage notation for a generic term 

specifying a member of a set of primitive single element information 
units such as operators (specified by the lower case letter, o) and 
variables (specified by the lower case letter, v). 

x u . . . subscripted lower case letter — metalanguage generic term for a primi- 
tive single element which is a member of the subset x u . . . of the primitive 
element set specified by x. For example, oi is the generic term for a logical 
operator. The set of logical operators is a subset of the set of all operators 
is a subset of the set of all operators, 0. 

Because there are so many subsets of identifiers, the major subsets have 
been assigned letters without subscripts in order to further reduce the 
clutter. Examples include i for integer, w for reserved word, v for 
variable. These latters are subscripted as above to form subsets. 
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Used as Element 



TABLE 11 : Definitions 

Used as Subscript 



Examples 



b 

c 



g 
h 

h i 



k 
1 

m 
n 



address label 



blank 

control indetifier 

data attribute 

element 
file name 



group (alternative) 
hollerith (literal) 



index variable 
(integer) 



constant 



data element 

identifier, label, 
name 

operator 



parameter 



non-numeric m a 
(alphanumeric) 

arithmetic o n a 

address o a 

binary Vb 

condition m c 

complex k c > v c 

dimension dd 

decimal (fixed point) n<i> kd 

external unpacked v e 

flag af 

prefix Onf 

floating-point kf, Vf 

function Om 

hollerith en, 

hexadecimal Vh> kh 
inverse (indirect address) oj 

fixed point vi 

instruction o w i 

conditional variable Vk 

logical Oi, vi 

numeric m n3 o n 

optional o W o 

offset a 05 o 0 

pointer a p 
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TABLE 11: Definitions (continued) 
Used as Element Used as Subscript Examples 

picture d p 

r relational o re i 

s — statement a s 

sign ©fa 

t next instruction type dt 

address 

u unsigned v u 

w reserved word c w , d w 

v variable o w 

x unknown element 



TABLE 12 

(a) Information Element Sets 

e = < element > 
= n | o 

n = < identifier ;> 
o = < operator > 

(b) Identifier Subsets 

n = < identifier-name, label, operand > 

= a | c | d | m 
a = = < address identifier — statement label, file> 

c = < control identifier > 

d = <data attributo 

m = <data identifier > 
a = a f | a Q | a s | f | t | a p 

^ = <flag label > 

a 0 = <ofiset address > 

a p — < memory address (pointer) > 

a 3 = < statement label, subroutine name> 

f = <tile name> 

t = <next instruction address > 
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TABLE 12: (b) Identifier Subsets (continued) 
c = b | c w 

b = <blank> 

c w = <reserved word used for control> 
d = dd | dp | d w | dt 

da = <dimension> 
dp — < picture > 

d w = <reserved word specifying attribute> 

d t = <type> 
m — m a | m n | v c | vi 

m a = <non numeric data> 

m n = <real numeric data> 

v c = < complex variable > 

vi = <logical variable> 
m a = k W a | m c | nih [ Vk 

k wa = <non-numeric figurative constant > 

m c = < condition namc> 

nih = <hollerith word> 

Vk = < conditional variable> 
m n = i | j | k n | v n 

i = < integer variable, index > 

j = <integer variable, index > 

k n = <real numeric constant > 

v n = <real numeric variable> 
k n = k D | ka | k f | k h | k wn 

kb = < binary constant > 

kd — < decimal (fixed point) constant > 

kf — <floating-point constant> 

kh = <hexadecimal constant> 

kwn = < numeric figurative constant > 
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TABLE 12 : (b) Identifier Subsets (continued) 

v n = v b v d | v e | v f | v h | Vi 

Vb = < binary variable > 

Vd = < decimal variable > 

v e = < external unpacked numeric variable > 

Vf = <floating-point variable > 

Vh = <hexadecimal variable > 

vi = < fixed-point variable > 

TABLE 13: Operator Subsets 
o = < operator > 

= O a | O e | Oi | O n | O w 

o a == < address operators 
o c = < control operator > 
oi = < logical operator > 
o n = < numeric operator > 
o w = < reserved word operator > 
o a = Oi | o 0 | o p 

Oi = <inverse address > 
o 0 = < offset > 
o p — < pointer > 

Oi- = b | O ca | 0 C n | O t 

b = < blank > 

Oca — < non-numeric control > 

o C n = <numeric control functions > 

ot — <go to next instruction > 

Ol = Oi p | Oir 

dp = <logical prefix (NOT)> 

oir = <logical relation (AND, OR)> 

O n = O r a | O n f | Orel 

o n a = < arithmetic > 
onf = < numeric prefix > 
Orel — < relational > 
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TABLE 13 (continued) 

O u t — Ofu | Ofu 

Of n = <numeric function> 

Of S — <sign> 
o w = o W i | o wo 

o W i = <reserved word instruction > 

o w0 = < optional reserved word instruction > 
Alternate Groupings 

Of = < prefix operator > 

— Ofn | ofs | olp 

Operator Subsets 



o a = Oi | o 0 | o p 

O c = b | O ca | O en I Ot 

Oea = <concatenation> | EOE | , 

Ocn = <function code> | <subscript code> | ( | ) | , 

Of = Ofn | Of S | Oip 

Ofn = SIN | COS I SQRT | <other functions > 

Of S = + | — 

0i = .1. 

Ol = Oip I Oir 

o ip « NOT | .NOT. 

oi r = AND | OR | .AND. | .OR. 

On = On a | Ofn | Of S | O re i 

Ona = + | - | <X> | <-f-> | <exp.> 

Onf = Of n | Ofs 

Oo = .O. 

Op = .A. 

Orel = <|<| = | 9 4|>|>|- 1 <| 1 =|- 1 < 

Ow — 0 W i | O wo 

Owi = IF | GOTO I DO | <any other reserved word instruction > 

Owo = WITH | THEN | AT | ADVANCING <any other optional 
reserved word> 

o t = <go to next instruction 
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TABLE 14: Information Structure Sets 

(The label on the left represents a member (any member) of the set 
whose members meet the requirements specified on the right) 

A = < storage structures > 

C = < operating system routine > 

D = <set of data attributes > 

E = < expression — in IL — 1, it consists of an alternating sequence 
of identifiers and operators > 

F = < Channels, hardware interrupts, hardware indicators such 
as condition and overflow flags > 

G = < Group composed of assorted structures from various subsets 
such as executable program statements, control instructions, 
expressions, elements. > 

H = < Library > 

M = <data structure > 

N = <identifier dictionary (symbol table) > 

O = < set of operators > 

P — < procedure, program, sub-routine> 

W = < reserved word dictionary > 

S = < statement, including delimiter > 

T = < decision table > 

V = < vector — n — dimension variable > 
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Expressions 



E 



EI 
Eln 



Ea 



Efh 



Ec 



TABLE 15: Information Structure Subsets 
<expression (parenthesis must be paired) > 

[of. . .] [(. . .] n [o [(. . .] [of. . .] [(. . .] nQ. . .]] 

Ela | Eln | Era | Ern | Ea | En | Efn | Ec | El 
Ela | Eln 

< logical numeric (decimal) expression > 



Jvl 1 ol fvl "I 
\Ern J \Ern J 



Ern = 



En = 



< relational numeric expression> 
En [or En] . . . 

<Tnumeric (arithmetic) expression: 



Ela 



Era = 



[onf] [(. . .] mn [ona[(. . .] [onf. . .] [(. . .] mnD- . .fj 
< logical expression of alphanumerics> 

[Era J [ol l^Era J ]. . . 



Era J [ol 

<relational expressions of alphanumeric^ > 
= Ea [or Ea] . . . 
= < alphanumeric expression > 



of 
oa 

= [of] ofn En 



[(. . .]ma 



[OCSL J 



•] [of] [(. . .] ma[). • .] 



El 

n ol ncl 
vk ol kn 



fol 



{mc }, 
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TABLE 16: Information Structure Subsets 



Groups 



G = <group composed of assorted structures from various subsets 
such as executable program statements, control instructions, 
expressions, elements. > 

= Gt | Gf 

Gti = <instruction to be executed if conditional expression, Eci, is 
true, If Gti is executed, Gfi is by-passed > 



- [«■■ ■ • ]. {?- } 



Gfj = instruction to be executed if conditional expression, Ed, is false. 



[*■• • • ] {*- } 



<the subscript, i, denotes the ith level of nested IF statements > 

ot = < specifies that the next instruction to be executed during 
program execution is the statement which follows the entire 
"IF" statement including all nested "IF" statements. 
It is an instruction to the compiler to generate coding as 
needed to carry out this instruction execution sequenco 



TABLE 17: Information Structure Subsets 



Data Structures 



M = <data structured 

m = < single identifier of data (memory) > 

Ala — <C array of data > 

Mf = <data in a file> 

Mg = <a generation of hierarchial data> 

Alh — < hierarchy of data> 

Ms — <a string of data> 

Ait = <table of data> 
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TABLE 18: Information Structure Subsets 

Statements 

S = < statement (including delimiter) > 

= Sc | Sd | Se 

Sc = < control instruction > 

Sd = <data declaration statement> 

Se = < executable statement > 

= Sb | Sx | Si 

Sb = < unconditional branch > 

Si = <IF statement > 

= IF Ec Gt owi Gf 

Sx = <any executable statement except an unconditional branch or 
an IF statement > 



Table 14 shows the assignment of upper 
case alphabet letters to the major subsets in 
Metalanguage. Table 15 defines the major 
5 subset "expressions" in terms of the next 
lower level in the Metalanguage natation. 

Table 1 6 defines the maj or information 
subset "G" (groups) and how it is repre- 
sented in Metalanguage. 
10 Table 17 shows the assignment of suffixed 
capital letters to specify the various major 
subsets of data structures in Metalanguage 
notation. 

Table 18 shows the assignment of labels 

15 to the major types of statements which be- 
long to the set "S" (statements), another 
major information structure subset. 

The Metalanguage developed hereinf is 
organized so as to define a hierarchy of con- 

20 cepts (useful in programming and computers) 
which are organized according to levels of 
complexity and sophistication so that there 
is an easy recognition of the opproximate 
hierarchial level of each concept. 

25 For example, with reference to Fig. 6, cer- 
tain useful information units of various com- 
plexity are shown in hierarchial relationship 
to each other by the line-tree drawing of Fig. 
6. The P represents a program, procedure cr 

30 set of programs. Now since any statement is a 
member of the set, S, of all statements, and 
since, S, is a subset of P, S, is linked to P. 
Other subsets of P are not shown. Now since 
statements contain expressions, there is a fur- 

35 ther link to E, which represents expressions 
showing that E, is a subset of S. Since ex- 
pressions are composed of elements, there is 
a further link to e, which represents elements. 



The set of all elements breaks down into the 
subsets: identifiers, n; and operators, o. 40 

The identifier set, n, may be subdivided 
into various subsets as shown by small letters 
a, b, Cj f, g, h, i, d, m and the variable v, 
each of which can be further subdivided as, 
for example, the variable v in v b , v„ v d , v f , 45 
Vi, v r . 

Other subdivisions of the element set, e, 
might be represented by n a , and n^ etc. 

The operator set, o, may be subdivided as 
seen in Fig. 6 to o a , o e , . . . etc. 50 

^ Fig. 7 shows a further organization of 
hierarchies of information in the Metalan- 
guage. In Fig. 7 the set, C, might represent 
the generic term for any member of the set, 
"control or operating system"; the letter M, 55 
might represent the generic term for any mem- 
ber of the set "data structure"; the letter A, 
may represent an area of storage; the letter 
P, may represent a program; the letters S, G, 
and T, which link to P, represents members 60 
of various sections or pieces (subsets) of the 
program P. Each of these symbols is further 
broken down into organizations involving less 
complex and less complicated bits of informa- 
tion. 65 

Table 19 shows a COBOL "IF" statement 
specification which is written in IBM nota- 
tion. It should be noted as a comment that 
some of the specifications for some of the 
terms in COBOL are never explicitly speci- 70 
fied and defined in the reference manuals, as 
for example, the term "NEXT SEN- 
TENCE". 

Table 20 illustrates the COBOL "IF" 
statement specification as written in Backus 75 
Normal Form (BNF). 
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TABLE 19 

COBOL "IF" Statement Specification — Written in IBM Notation 



fGREATER THAN 
IF arithmetic — expression — 1 IS [NOT] 1 EQUAL TO 

[LESS THAN 



arithmetic — expression — 2 [THEN] f statement — 1 . . . 

1 NEXT SENTENCE 

/ELSE \ fstatement-2. . . \ 

\OTHERWISE / \NEXT SENTENCE /. 

Note: specifications for some of the terms are buried in the text of the 
manual. Other terms such as NEXT SENTENCE are never explicitly speci- 
fied. 

TABLE 20 

COBOL "IF" Statement Specification — Written in BNF 



} 



<symbol> : IF < arithmetic expression > IS 

< symbol 1> ; := < symbol > | < symbol > NOT 

< symbol 2> : :== < symbol 1> GREATER THAN | < symbol 1> 
EQUAL TO | < symbol 1> LESS THAN 

< symbol 3 > : < symblo 2 >< arithmetic expression> 

<symbol4> : := < symbol 3> | <symbol 3>THEN 

< symbol 5> : :== < symbol 4 >< executable statement > I 
< symbol 4>NEXT SENTENCE 

< symbol 6> : := < symbol 5> ELSE I < symbol 5> OTHERWISE 

<COBOL IF> : := <symbol 6><executable statement> I 
<symbol 6> NEXT SENTENCE 

Table 21 shows the COBOL "IF" statement specification as written in the newly 
developed Metalanguage. 
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TABLE 21 

COBOL "IF" Statement In New Metalanguage 



fELSE 

IF Em, IS [o f ] Orel En* [THEN] Gt«> y Gf 

^OTHERWISE 



} 



where: 



Gt 

o t = 

Gf = 

o t = 



<go to next statement following Gf> 
[Sx. . .] 

<go to next statement following Gf > 



Lot J 



Figure 10 shows genetically how the "IF" 
statement is represented in Metalanguage. 
The rectangular blocks represent "strings of 
information units". Sj is a generic statement 
of "IF" and compares to the Table 21 state- 
ment. 

In Fig. 10, an information unit such as 
Eln is connected in terms of lower order 
hierarchies of information as shown in the 
Metalanguage notation. 

Use of Metalanguage Format 
The new Metalanguage provides a means 
of specifying the information content 
of various source languages as well as the 
first Intermediate Language* IL — 1, and the 
second Intermediate Language, IL— 2, in a 
systematically labled common notation which 
defines information units in a precise, concise 
way. Therefore, identical information units 
can be easily identified In addition, the hier- 
archial level of information units (degree of 
complexity or quantity of information) can be 
recognized without knowing all the elemen- 
tary details of the information unit involved. 

As a result of this tool (the new Meta- 
language) source language statements in vari- 
ous languages can be written in the new 
Metalanguage for study and comparison to 
determine similarities and differences in the 
functions provided by the various source lan- 
guages, so that the intermediate language 
functions can be designed in an efficient man- 
ner. Without this Metalanguage, it would be 
necessary to place a heavy reliance on remem- 
bering all the complications, restrictions, and 
specifications for the information units in each 
of the source languages for a statement being 
studied. In addition, if this statement were to 
be reconsidered at a later time, one would 
again have to refresh his memory on each of 
the source languages: 



As a result of the work on developing in- 
termediate languages, parameters have thus 
been identified and organized at various levels 
of an information hierarchy as follows: 

(a) At least 15 major information classes 
have been established for large multi-element 
information units such as statements (S) and 
expressions (E). 

(b) At least 35 major subsets of the above 
groups have been established - - as multi- 
element units such as logical expressions (El). 

(c) 18 principal primitive types have been 
established - - as sets of single elements of 
information such as operators (a), variables 
(v), etc. 

(d) 44 subsets of single element groups 
have been established - - of which examples 
include types of operators such as arithmetic 
operators, logical operators, and control oper- 
ators. Reference may be made to Figs. 6 and 
7 in this regard. 

One of the discoveries made in regard to 
the development of the information hierarchy 
and the new Metlanguage notation may be 
illustrative of the discoveries involved. In the 
early stages of this work, all "reserved 
words" (including instructions) were classified 
as "identifiers". Also, "instructions" were 
considered identifiers of call routines. After 
experience and work with these original 
classifications, it was found from experience 
that those "reserved words" used as instruc- 
tions or commands and "instructions" should 
be designated and denned as "operators" 
rather than as identifiers. 

Hereinbelow will be found a group of 
tables which illustrate how various statements 
from major source languages can be written 
in the new Metalanguage notation and how 
they transform into statements in the first and 
second intermediate languages and how they 
may be mapped. 
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The following tables, tables 15 — 24 will 
show the representation of various COBOL 
and FORTRAN functions in Metalanguage 
notation : 

Table 15 shows representative functions 



which are found in higji level languages in 
the left column, while the equivalent function 
in IL — 1 format is shown in the right hand 
column. 



Representative Functions 
Found In Programs 



TABLE 15 



Representation in IL — 1 Format 
Using Metalanguage Notation 



Logival IF 
Relational 



IF El Gt Gf 
IF El Gt Gf 



15 



Sign (COBOL) 
Sign (FORTRAN) 
Class 
Condition 
On Count 

Unconditional Branch 
Alter (assigned Branch) 

Computed Branch 
Iteration 



Assignment 



IF El Gt Gf 
IFS En S x S 2 S 3 
IFCL n ncl Gt Gf 
IFCfvkgGtGf 
ON Km K i2 K l3 Gt Gf 
BR as 

ALTER as-i as 

GO as, asl, as2, . . . 

GOTO i, asl, as2, . . . asj 

DO cl al a2 vnl mnl 
mn2 mn3 El x [a3 a4 

vn2 mn4 mn5 mn6 

El 2 [a5 a6 vn3 mn7 mn8 
mn9 E1J] 

ASGNMT ow crd M E Se mn4 



Table 16 shows the Conditional Branch 
(IF) of source language FORTRAN and 
source language COBOL in Metalanguage 
notation. Table 16B for COBOL shows sub- 
portions (a), (b), (c) showing, in Metalanguage 
notation, three forms of conditional branch 
(IF) tests. 

Table 16 subsection (d) shows the single 
IL — 1 statement into which everyone of these 



FORTRAN and COBOL statements map. 

Subsection (e) lists the Generalized Com- 
piler tasks for these conditional branch (IF) 
statements. 

Subsection (f) of Table 16 shows a typical 
portion of the conditional branch IF state- 
ment in IL — 2. 

Subsection (g) illustrates how the condi- 
tional branch IF statement is mapped. 
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TABLE 16; Conditional Branch (IF) 

(compared after alignment of decimal points if all elements numeric) 
(A) FORTRAN 



IF (Eln) St 
(B) COBOL 

(a) LOGICAL TEST 



/ELSE \ 
IF El [THEN] Gt ^OTHERWISE f Gf at 

(b) RELATIONAL TEST 



> 
< 

GREATER THAN 
EQUAL TO 
LESS THAN 



IF E x IS [NOT] < 
(c) SIGN TEST 



*E 2 [THEN Gt < 



rELSE 
OTHERWISE J 



Gf at 



fPOSITIVE "| ["ELSE ~) 
IF En IS [NOT] 4 ZERO MTHEN] Gt 4 >Gf at 
f ^OTHERWISE J 



(NEGATIVE J 



(d) IL— 1 



IFi Eli Gti Gf i 



at 



I 

(e) G. C. TASKS 

< symbol table look-up of attributes of all elements > 

<code to align decimal points if attributes of all elements are 
numerio 

<rearrange expression according to precedence assigned to 
operators > 

<generate code to compute the value of the expression> 
<code to test logical value> 

<code to execute Gt and skip Gf if test result true> 
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TABLE 16: Conditional Branch (IF) Continued 
<code to execute Gf)> 
<repeat above steps for each IF statement > 
< repeat . > 

<the next statement to be executed > ot < is the statement 
following the last nested "IF" statement. In the IL — 1 format 
it is at > at. <Nested "IF" statements can appear in Gt or 
Gf> 



(f) IL — 2 



a o n [n] 
(g) Conditional Branch (IF) 



MAP 



SOURCE 



MAPS INTO 



st 



Gti 



Gfi 



NEXT SENTENCE 

< executable statement ... 
following THEN or 

El and delimited by ELSE or 
OTHERWISE or 

a keyword designating a new 
instruction > 

"<executable statement . . . 
following ELSE or 

OTHERWISE delimited by 
the start of a new instruc- 
tion > 

E A orel E 2 

En orel O.O 



= Gt 



rsb 

[Sx. . .] 4 Si 1+ 

Lot 



[Sx. . .] 
GOTOt 



{!" } 



= Gt 



= Gf 

= E1 
= E1 
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TABLE 16: <g) Conditional Branch (IF) Continued 
SOURCE 



MAPS INTO 
(IN IL— 1) 



f GREATER THAN 1 




EOUAL TO 1 




LESS THAN I 


<* 


< NOT GREATER THAN | 


~l:> 


TvTnnr rrr\TT a t th i 
JLNVJ 1 HVjUivLi 1 \J 1 


i= 


w NOT LESS THAN | 


k . 


"POSITIVE | 


>o "! 


ZERO | 


— o 


NEGATIVE | 


< o 


NOT POSITIVE | 


l>o 


NOT ZERO | 


1=0 


JNOT NEGATIVE | 


Ko J 



= orel 



* = orel O.O 



TABLE 17 shows the FORTRAN Sign Test stated in Metalanguage 
(a) and the equivalent statements in IL— 1 (b) and in IL— 2 (c). Sub- 
section (d) of Table 18 shows the mapping. 

TABLE 17: Sign Test 

(a) FORTRAN 

IF (En) Si, S 2 , S 3 

(b) IL— 1 

IFS En, S 19 S 29 S 3 

(c) IL— 2 



<check symbol table — all elements of E must be numerio 
<code to compute value of En> 
COMP En O.O 
BRLS X 
BRE S 2 
BRG S 3 
(d) MAP 



Sj = <next statement to be executed if the value of the relational 
expression is negative. > 

5 2 = <next statement to be executed ifthe value of the relational 

expression is zero. > 

5 3 = <next statement to be executed ifthe value of the relational 

expression is positive. > 
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Similarly to Table 17, table 18 shows the COBOL Class Test; Table 
19 shows the Condition Test for COBOL; Table 20 shows the COBOL 
function of Conditional Branch on Count; Table 21 shows the COBOL 
and FORTRAN Unconditional Branch functions; Table 22 iUustrates the 
COBOL and FORTRAN Alter functions. 



TABLE 18: Class Test 

COBOL 



fNUMERIC 1 fELSE 1 

IF n IS [NOT] 4 Y [THEN] Gt { V 

[ALPHABETIC J [OTHERWISE J 



Gf 



IL— 1 



IL— 2 



IFCL n, ncl, Gt, Gf 



<code for character by character table look-up of class specified 
by ncl> 

BRE Gt 

EXEC Gf 

TABLE 19: Condition Test 



COBOL 



IL— 1 



IL — 2 



fELSE T 
IF vk IS [NOT] kn [THEN] Gt ^ > Gf. 

^OTHERWISE j 

faf 1 fELSE ") 

IF [NOT] 1 y [THEN] Gt i Y Gf. 

[mc J ^OTHERWISE J 

IFCf a vk,g,Gt,Gf 

<look up in symbol table to determine if mc or af or kn> 

<code to find mc corresponding to kn if vk specified > 

<code to address and fetch the conditional variable field vk in the . 
record currently in the file location,, f > 

< compare > vk g 

BRE Gt 



EXEC Gf 
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TABLE 19: Condition Test Continued 

MAPS 

where: g = mc | af | kn 

kn = < program supplied test value specifying one of the condition 
names which the conditional variable may represent > 

vk = <C conditional variable (source program translator generates name 
and assigns a value of 1 to it in the case of forms overflow 
test)> 

< translator supplies the file name, f, of the file currently open. > 

TABLE 20: Conditional Branch On Count 

COBOL 

ON kil [AND EVERY ki2] [UNTIL ki3] 

fELSE ~\ 
Gt[ \ V Gf] 

[OTHERWISE J 

IL— 1 

ON kil ki2 ki3 Gt Gf 

COMPILER TASKS 

<assign name to counter variable, i. add name to symbol table 
and set its value to zero> 

< default ki3 — infinity. For both ki2 and ki3 blank (execute Gt 
once) 

default ki2 = kil, ki3 = kil + 1 > 

IL— 2 

i=i + l 

IF i>kl3 GO TO a4 
ai IF (kil — i) a2, a3, a4 

a2 kil = kil -h ki2 

IF (kil — ki3) al, a4, a4 
a3 Gt 
a4 G 
a4 Gf 
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TABLE 21 : Unconditional Branch 

FORTRAN 



GO TO a s tmt 
COBOL 



fapara j 
L a sec J 



GO TO 

IL—1 

BR as 
IL— 2 

BR as 

MAPPING REQUIREMENTS 

SOURCE IL—1 



a 8 tmt 
Spar a 

TABLE 22: Alter 

(a) FORTRAN 



as 



ASSIGN a B tmt-i TO a fltm t 
GO TO (a stm t i' 



(b) COBOL 



fapara ~) 

ALTER ^ apara-i TO PROCEED TO j ; 



GO TO [a P ara]. 
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TABLE 22: Alter. Continued 

IL— 1 

|aLTER as-i, as 
-^GO as asl as2 

IL— 2 

< check that as-i in the ALTER statement is a legal member of 
the set in the GO statement. > 

<look up in symbol table the address (loc2) of the branch to as> 

<code to move instruction at location 1 to location 2> 

<code to branch past location 1 > 

loc 1 : <code to branch to as-i> 





loc 2: <code to branch to as> 
EXAMPLE: 



MVAR$1 3 $2 3 4 
BR $1+4 
$ 1 BR as-i 



$2 BR as 
MAP 

SOURCE IL— 1 

a 5 tmt-i as-i 

astmt as 

a P ara-i as-i 

a B ec as 

Spar a as 
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If a P ara is omitted in the COBOL statement, the GO TO statement must 
have a paragraph name, by the only statement in the paragraph and be 
modified by an ALTER statement before 1st execution. 

Table 23 illustrates the Computed Branch functions for FORTRAN 
and COBOL, which, as the above, includes mapping requirements. Table 
24 shows some of the Assignment Statement Varieties in FORTRAN (A) 
and COBOL (B), followed in sequence the statements in IL — 1 format 
(c), the Generalized Compiler tasks (d), the IL — 2 format (e), and the 
mapping (g). 

Subsection (h) of Table 24 shows mapping details applicable to specific 
COBOL source statements corresponding to subsection B as related to 
IL — 1 format. 



TABLE 23: Computed Branch 

FORTRAN 



GO TO (asimlb 3slnH*jj • - •)> i 

COBOL 



[aparai | r>aparao ~T 
GO TO *l f DEPENDING [ON] i 

L a scci J H^secs -* * ■ • * 



IL— 1 

GO TO i, asL as2, . . .asj 



IL— 2 



BR 1 4- * 
<i = 1> BRasl 
<i = 2> BR as2 



<i = j> BR asj 

Mapping Requirements 

In COBOL, if i>j, the GO TO statement is ignored and control passes 

to next sequential statement. 

* designates current instruction address. 

SOURCE IL— 1 



a s tmt-i as-i 
apara— i as-i 
a S ec-i as-i 
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TABLE 24: Assignment Statement 

(A) FORTRAN 



(1) vl = El 

(2) vn = En 
(B) COBOL 



{:} 



(3) COMPUTE 4 I [ROUNDED] 



-CH" 



; [ON] SIZE ERROR Se . . . 

(4) ADD mn2 ... TO mnl [ROUNDED] 

jj [ON] SIZE ERROR Se . . 

(5) ADD mn2 mn2 . . . GIVING mnl 
[ROUNDED] Q [ON] SIZE ERROR Se . . 

(6) ADD CORRESPONDING Mn2 TO Mnl 
[ROUNDED] [ON] SIZE ERROR Se . . 

(7) SUBTRACT CORRESPONDING Mn2 
FROM Mnl [ROUNDED] 

|jr [ON] SIZE ERROR Se . . . 

(8) SUBTRACT mn2 . . . 

f vnl [GIVING vnl] 1 
FROM J L 
[kn GIVING vn J 

[ROUNDED] [ON] SIZE ERROR Se . . 

(9) SUBTRACT mn2 FROM mnl 
[ROUNDED] Q [ON] SIZE ERROR Se . . 
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TABLE 24 : Assignment Statement Continued 

(B) COBOL 

fvnl [Giving vn] 1 

(10) MULTIPLY mn2 BY 1 > 

{kn GIVING vn J 

[ROUNDED] [j [ON] SIZE ERROR Se . . . J 

fvnl [GIVING vn] 1 

(11) DIVIDE mn2 INTO 1 > 

[kn GIVING vn J 

[ROUNDED] [REMAINDER mn4] 

|j [ON] SIZE ERROR Se . . . J 

. (12) DIVIDE mn3 BY mn2 GIVING vn 
[ROUNDED] [REMAINDER mn4] 

Q [ON] SIZE ERROR Se . . . J 

IL— 1 

ASGNMNT ow crd M E Se mn4 
G. C. TASKS 



<process expression, E, to transform it from algebraic notation into 
an operation sequence based on standard precedence rules > 

<the symbol table is provided with the updated length of all ex- 
pressions, E, and executable statement groups, Se> 

<Cif the source language precedence rules are not standard, the 
source language translator must add parentheses as needed to 
convey the correct precedence information > 

IL — 2 



a [n] o n 
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TABLE 24 : Assignment Statement Continued 
GENERAL 

SOURCE LANGUAGE IL— 1 

vl M 

vn M 

mr M 

vnl <with vn blank> M 

mnl 2Vi 

-El E 

=En E 

=kn E 

ROUNDED S et crd =1 

ON SIZE ERROR Se . . . Se 

element expression ow = 0 

ADD CORRES ow 1 

SUBT. CORRES ow = 2 

CONCATENATE ow = 3 

SEPARATE ow = 4 

Mn x <=mnl-l . . . > M 

Mn 2 <= mn2-i . . . > £<Ei [o EJ . . 
SOURCE STATEMENT 
NO. 

(4) mn2[+ mn2] . . . +mnl E 

(5) mn2{+ mn2} ... £ 

fvnl 1 

C8) 1kn ^ mn2 } • • ■ E 

(9) mnl — mn2 £ 

(10) mn2*|™* J E 

(11) } Z-2 

(12) mn3/mn2 e 
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Operation 

In the preceding sections involving descrip- 
tion, there has been presented a computer 
system which is enhanced by an optimal use 
5 of a generalized compiler. Fig. 1, shows a 
flow diagram whereby the source program 
in high level language is communicated to a 
source language translator, SLT 10, wherein 
there is a separation of non-executable infor- 

10 mation and executable statements (in the 
IL — 1 language format). Then the non- 
executable information is mapped into the 
Symbol Table 13 of the Generalized Com- 
piler 11, and the executable statements are 

15 communicated to Problem Transformation 
Module 12. Subsequently, this information is 
communicated from the Generalized Com- 
piler 11 to the Machine Code Transiator 
MCT 14 which provides an output of the 

20 source program in primitive operation se- 
quence code. 

Fig. 2 illustrates a more specific embodi- 
ment showing how each hi£h level source 
language (such as COBOL, Fortran, etc., is 

25 provided with its own specific source language 
translator (SLT) respectively, such as 10 v , 
10„ and 10 a . Further, Fig. 2 also shows that 
each host computer, such as No. 1, No. 2, and 
No. 3 is provided with its own individualized 

30 program package (MCT) designated as 14 IMl , 
14„, 2 , and 14 m3 . 

Figures 11 through 14 show various con- 
figurations of the system which can be used 
to suit particular requirements. 

35 Fig. 1 1 shows a computer system configura- 
tion using the GC for the conventional, cur- 
rent processing sequence of compiling a source 
program into an object program in machine 
code for a specific current type machine, the 

40 object program having been processed to the 
extent that it is ready for the computer to 
link, load, and execute the program to pro- 
cess the data input supplied at run-time. 
When said object program is not being used 

45 to control the processing of data, it is stored 
on cards, tape, disc files, or other secondary 
storage media. In Fig. 11 a sequential system 
of program operations is shown wherein the 
source program 30 is operated on by the 

50 Source Language Translator 10. The pro- 
gram, in IL — 1 format, is then operated upon 
by the Generalized Compiler 11 having an 
output program in IL— 2 format. This pro- 
gram is processed by MCT 14 to provide an 

55 object program 42 which is in machine code 
for a specific host computer- Data, as repre- 
sented in block 35, is combined for opera- 
tion with the object program 42, this object 
program being the source program stored on 

60 cards, tape, or disc at times occurring between 
executions of the object program. Following 
this is the run-time machine code execution 
routines designated as 44 which provides the 
output designated as block 45. 

65 This configuration replaces the standard or 



conventional compiler functions where great 
amounts of memory and computer time are 
used to accomplish compilation by means of 
a separate compiler for each source language. 

In Fig. 11 block designated as the object 70 
program 42 is the output of the compiler 
which can be stored on cards, tape, or disc 
with no necessity for actual execution of the 
program. In other words, it may be held and 
stored for later execution as in conventional, 75 
current computer systems. Line 39 indicates 
that, subsequently, at some convenient time, 
the object program 42 may be utilized for 
execution in combination with the data block 

35. 80 

Fig. 12 shows a computer system configura- 
tion using the Generalized Compiler in which 
the source program is processed into an 
object program and a symbol table at the 
IL — 2 level. These are stored in secondary 35 
storage until the program is to be executed. 
In this configuration, the IL — 2 is the imple- 
mentation language of the target or host com- 
puter which means that the computer has 
iun-time interpretive routines which can exe- 90 
cute the object program directly. Run-time 
interpretive routines perform the final trans- 
lation of IL — 2 into primitive machine codes 
as necessary, and provide for type conversion 
(for instance, integer to floating point) as 95 
necessary based on data attributes furnished 
in a symbol table, or as part of the input data, 
or in a data base. 

Object programs which are in IL — 2 lan- 
guage are less vulnerable to obsolescence or 100 
change than a conventional object program 
written in machine code for a specific mach- 
ine. Object programs in IL — 2 can be used 
on any of a class of machines without recom- 
piling the source programs. Recompiling \ 05 
would be necessary in the event the new 
machine used a different implementation lan- 
guage. However, the implementation language 
does not change as often as machine codes 
since the implementation language, being 1 10 
IL — 2, is designed to be machine-independent 
for a class of machine. 

In Fig. 12 source program 30 is processed 
by SLT 10 and GC 11 into an object pro- 
gram in IL — 2 designated block 42. Thus, H5 
this object program in IL — 2 can later be 
executed in any one of a class of computers. 
Using Run Time Interpretive routines 43 
to process the block 35 as per the object pro- 
gram instruction sequence and the Symbol 120 
Table 13' (which may be in a data base), an 
Output 45 of processed data may be effi- 
ciently realized. 

Figs. 13 and 14 illustrate computer system 
configurations which differ from each other 125 
only in that in Fig. 14, a data base 36 is used 
rather than a Symbol Table as in Fig. 13. 
These particular computer systems (Fig. 13 
and Fig. 14) illustrate an approach to com- 
puter architecture that is not feasible without 130 
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the Generalized Compiler concept. ■ This ap- 
proach visualizes the general use of large 
common data bases and the elimination of 
data declaration from a large class of source 

5 programs thereby considerably simplifying the 
writing of source programs. In conjunction 
with the data base, a separate compiler can 
be used to update the data declaration infor- 
mation as needed. 

10 Because the Generalized Compiler is de- 
signed to be used with various source lan- 
guages making it a frequently used part of 
the computer system and because of the ad- 
vent of LSI technology reducing the cost of 

15 hardware-firmware, it is feasible to consider 
implementing the various subroutines of the 
Generalized Compiler on LSI chips. This 
use of high speed circuitry could increase the 
compile speed by an order of magnitude 

20 thereby doubling the total throughput of a 
computer system. 

In the configuration of Figs. 13 and 14, the 
source program is translated by S LT 1 0 
(which might also be implemented in hard- 

25 ware-firmware) into an object program 41 in 
IL — 1 and a Symbol Table 13 (unless a data 
base 36 is being used). The object program 
41 and Symbol Table 13 are stored in secon- 
dary storage until execution of the program 

^0 is desired. During program run- time, the 
Generalized Compiler 11 maps the source 
program statements from IL — 1 into IL — 2, 
mapping as much as one block or procedure 
at a time. The run-time interpretive routines 

" 44 process or execute these IL — 2 com- 
mands in conjunction with the input data 
35 and data attribute information 
from the data base 36 or sym- 
bol table 1 3 producing program output 

40 44'. In case of programs which are to oe 
run many times, the output of the General- 
ized Compiler in IL — 2, microcode, or other 
primitives, may be stored on secondary stor- 
age as a new object program 45' which will 

45 be used in subsequent executions of the pro- 
gram bypassing the Generalized Compiler. 

The Generalized Compiler — Summary 

As a result of the language development 

50 and use to date, it is now possible to specify, 
with reasonable assurance, a basic general set 
of functions which can be used to describe 
the great majority, of user programs being 
written today regardless of the source Ian- 

55 guage in which written. This makes possible 
a new simplifying approach to compiler de- 
sign. 

Instead of trying to fit the compiling task 
into one elegant overall concept such as a 
60 Syntax-directed compiler or a compiler-com- 
piler, the system described herein has divided 
the task according to independent parameters. 
Thus, language translation is separated from 
problem transformation; in other words, how 



the problem is stated is separated from the 65 
problem itself. In addition, the description 
of the data attributes and environment (data 
declaration) is separated from the actions to 
be performed on the data (the executable 
statements). 70 

As a result, compiler complexity has been 
reduced without placing new restrictions on 
the user. Instead of designing a completely 
new compiler for each language for each new 
computer, all that is required is a new source 75 
language translator to translate from one 
high level language to another when a change 
in source language is involved and a new 
translator to translate from one machine-level 
code to another when a new computer is 80 
involved. 

To implement this concept of separating 
problem transformation from language trans- 
lation. Two intermediate languages IL — 1 and 
IL — 2, were developed. Nonexecutable in- 85 
formation in IL is in the form of a symbol 
table. The executable input to the Problem 
Transformation Module is in high level lan- 
guage, IL — 1, which is source language inde- 
pendent. The executable output from the 90 
Problem Transformation Module is in 
machine-level operation code format, IL — 2, 
which is machine independent for a class of 
machines. 

As a result, the Problem Transformation 95 
Module need not be redesigned for either a 
source-language or a computer change. Be- 
cause of this design stability, the Problem 
Transformation Module can be designed as 
a hardware-firmware unit thus reducing the 100 
work load of the CPU and the operating 
system and increasing the speed of compiling. 

As a further exciting possibility, the flow 
of jobs through a computer may be radically 
changed so that data attributes are stored as 105 
part of a common data base and are compiled 
and updated independently from executable 
programs by a specialized compiler which 
only compiles data attributes and environment 
information, assigns storage addresses to data, 110 
and otherwise interfaces with the common 
data base. Users using common data already 
in the common data base do not have to 
declare the attributes. Mixed data types are 
converted as necessary as they are fetched 115 
from the common data base during run time. 

As another possibility, the source language 
can be some conversational language which 
can be compiled on-line assuming the conver- 
sational language is a simple one-to-one trans- 120 
lation into IL — 1 and assuming that the 
Problem Transformation Module is a hard- 
ware-firmware unit, especially if the source 
syntax recognizer is also implemented as firm- 
ware. 125 

In summation, the compiler system de- 
scribed herein promises to make the com- 
puter function more manageable and thereby 
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to allow more freedom and flexibility to de- 
velop better computer systems and user lan- 
guages. 

WHAT WE CLAIM IS: — 

5 1. Data processing system when conditioned 
by progiamming means which includes trans- 
lation means arranged to cause the data pro- 
cessing system to translate a source program 
in a high level source language into an inter- 

10 mediate language which is a high level lan- 
guage independent of the source language, 
said translation means being effective, in 
operation, to separate executable statements 
(as hereinbefore defined) from non-executable 

15 information (as herein defined). 

2. Data processing system according to 
Claim 1, wherein said non-executable infor- 
mation is stored in tabular form in a symbol 
table. 

20 3. Data processing system according to 
Claim 2 y wherein operands in said inter- 
mediate language are formed by symbol table 
addresses. 

4. Data processing system according to any 



one of the preceding claims, wherein said 25 
programming means includes compiling means 
arranged, in operation, to cause the data pro- 
cessing system to convert said source program 
from said intermediate language into a second 
intermediate language which is a low level 30 
language. 

5. Data processing system according to 
Claim 4, including means for storing said 
source program in said second intermediate 
language. 35 

6. Data processing system according to 
Claims 4 or 5, wherein said programming 
means includes second translation means ar- 
ranged, in operation, to cause the data pro- 
cessing system to translate said source pro- 40 
gram from said second intermediate language 
into a machine language. 

7.. Data processing system substantially as 
hereinbefore described with reference to the 
accompanying drawings. 45 
R. G. ROBINSON, 
Chartered Patent Agent, 
Agent for the Applications. 
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